17 Best Crime Datasets for Machine Learning

Article by Lucas Scott | August 27, 2019

For those looking to build text analysis models, analyze crime rates or trends over a specific area or time period, we have compiled a list of the 16 best crime datasets made available for public use. The datasets come from various locations around the world and most of the data covers large time periods. 

 

Canada Crime Datasets

Crime in Vancouver – This dataset covers crime in Vancouver, Canada from 2003 to July 2017. The data contains the type of crime, date, street it occurred on, coordinates, and district. 

Ontario Crime Statistics – Available on the Government of Canada website, this dataset includes crime statistics from the province of Ontario from 1998 to 2018. The data includes crime rate per 100,000 people, amount of cleared cases, cases cleared by charge, people charged, adults charged, youth charged, and more.

Toronto Assault Crime – Provided by the Toronto Police Service over the Public Safety Data Portal, this dataset includes an interactive map with every assault incident from 2014 to 2018 plotted on the map. The data is downloadable as a spreadsheet with over 59,000 rows. 

 

United Kingdom Crime Datasets

Crime in England and Wales – Published by the Home Office, this dataset contains crime statistics from 2008 – 2009. The data was compiled from the British Crime Survey and recorded crime data from the police. The dataset includes statistics data on violent crime, property crime, and more in XLS format. 

London Crime – This dataset contains 13 million rows of data with the following columns: borough, type of crime, and date. 

 

United States Crime Datasets

Austin Crime Statistics – With data covering crimes reported in Austin between 2014 and 2016, this dataset contains 159,000 rows of data with 18 columns. The data includes location info, date and time, area, district, and description of the crime.

Baton Rouge Crime –  This crime dataset contains all incidents handled by the Baton Rouge Police Department. The crimes covered in this dataset include: narcotics, theft, assault, nuisance, vice, battery, damage to property, sexual assaults, and homicide. Due to privacy issues for assault victims, the data is not geocoded. 

Crimes in Boston – This Boston crime dataset includes information about incidents where Boston PD officers responded between August 2015 to date. The dataset includes information about the type of crime, the date and time of the crime, and the location where it occurred. The CSV file includes the following columns: incident number, offense code, offense code group, offense description, district, reporting area, shooting, date, year, month, day of the week, hour, street, latitude, and longitude. 

Crimes in Chicago – The Chicago crime dataset includes reported crimes dating back to 2001 and is updated constantly with a seven-day lag between updates. The dataset includes location info, incident type and description, year of the incident, and date the record was updated. 

Denver Crime Data – Updated regularly, the Denver Crime Dataset covers criminal offenses in Denver over the past five years and also the current year. The data within this crime dataset comes from the National Incident Based Reporting system and includes the following information: offense codes, offense types, date of crime, reported date, address, and location.

FBI National Incident Based Reporting System (NIBRS) – This dataset is a great resource for crime or policing analysis in the United States. The original data has been cleaned and organized into one convenient database. 

Los Angeles Crime and Arrest Data – Based on open data from the city of Los Angeles, this dataset includes crime data from 2010 to 2019. The dataset includes the report ID, arrest date, time, area, suspect data, type of charge, charge description, and location info. 

NYC Complaint Data – This New York City crime dataset includes all crimes reported to the New York City Police Department from 2006 to 2017. The data includes 6.5 million rows and 35 columns including: incident date, complaint number, location, coordinates, suspect info, victim info, and more. 

Oakland Crime Statistics – This dataset contains crime data from Oakland between 2011 and 2016. Each year has its own separate CSV file for a combined total of over 1 million rows of data and 10 – 11 columns. 

Open Baltimore Crime Data – This crime dataset is updated every week with a lag time of nine days to allow for changes to the data and processing time. The dataset covers crimes in Baltimore and has 16 columns of data, including date, crime code, location, description, coordinates, and number of incidents. 

Phoenix Crime Data – Updated daily, the Phoenix Crime Dataset is a CSV file that contains crime data from November 2015 to date with a seven day lag. The data includes information about homicides, rapes, robberies, aggravated assaults, burglaries, thefts, motor vehicle thefts, arson, and drug offenses.

San Francisco Crime Classification – Containing crime data from 2003 to 2015, this dataset includes the following information: timestamp of incident, category, description of incident, day of the week, district, resolution, address, and coordinates.

 

Still can’t find the data you need? Lionbridge provides custom dataset creation for a variety of use cases. With a global multilingual crowd and 20 years of experience, we can provide accurate ground truth data at scale. Learn more about how we can help your project be an industry-leading success. 

Interested? Get high-quality data now
The Author
Lucas Scott

Lucas is a seasoned writer, with a specialization in pop culture and tech. He spends most of his free time coaching high-school basketball, watching Netflix, and working on the next great American novel.

    Welcome!

    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.