The novel Coronavirus which originated in Wuhan is making headlines all over the world, with live updates from major news publications multiple times a day. For those looking to track the rate of spread, or conduct other research about the virus, numerous datasets have been made available through public and paid platforms. This article will highlight some of the most widely-used coronavirus datasets, covering data from all countries with confirmed COVID-19 cases.*
Note: Many of the datasets below are compiled and maintained by Statista. On their website, basic statistics and file exports are free, but some features or certain datasets will not be available without a paid plan. Statista datasets will be clearly identified throughout the article.
*This article will be continuously updated, as new COVID-19 datasets become available.
Global Coronavirus Datasets
Novel Coronavirus COVID-19 (2019-nCoV) Data Repository – This dataset is maintained by John Hopkins University and the ESRI Living Atlas Team. The data is compiled from multiple sources, such as World Health Organization, China CDC, US CDC, Government of Canada, and more.
2019 Coronavirus data – This is a simple reformatting of the John Hopkins University dataset into organized CSV files. It contains data from January to February 2020.
150 Million COVID-19 Tweets – This is a dataset containing over 150 million tweets related to COVID-19, beginning from March 11th, 2020. The tweets crawled are of all languages, with English, Spanish, and French being the most prevalent.
Coronavirus Genome – This dataset is a simple TXT file containing the complete COVID-19 genome sequence.
COVID-CT-Dataset – From UC San Diego, this dataset contains 275 CT scans that are positive for COVID-19.
COVID-19 Timeseries+Lat/Lon – This dataset contains Coronavirus case data, complete with the data source, country and province data. Furthermore, it also includes the latitude and longitude coordinates of the countries where cases have been reported.
The Complete COVID-19 Dataset – A single CSV file containing worldwide Coronavirus case data that is updated every 24 hours.
(COVID-19) cases worldwide – This dataset contains the number of novel Coronavirus cases divided by country. This dataset is updated regularly. (STATISTA)
Dimensions COVID-19 – This repository contains all clinical trials, publications, and datasets relevant to Coronavirus from the Dimensions scholarly research database.
Public Coronavirus Twitter Dataset – This is the first public COVID-19 Twitter dataset and it includes Tweets about Coronavirus continuously being collected, starting from January 22, 2020.
CORD-19 – From the Allen Institute for AI, CORD-19 is an open dataset consisting of over 45,000 scholarly articles about Coronavirus.
C3.ai COVID-19 Data Lake – From C3.ai, this resource is a unified data model that comes ready for analysis. The model brings together numerous reputable sources of COVID-19 data from all other the world and is available to download for free from the C3.ai website.
Australia Coronavirus Datasets
Tracking Coronavirus: Australia Data – From BNO News, this resource contains the map data and timeline information for COVID-19 cases in Australia.
Canada COVID-19 Datasets
Tracking Coronavirus: Canada Data – From BNO News, this dataset includes the map data and timeline information for COVID-19 cases in Canada.
China Coronavirus Datasets
Coronavirus: China and Rest of World – A Kaggle notebook that compares the rate of spread and cured cases in China vs. rest of the world. It also includes the datasets used to make the comparisons.
China Regions Map – This simple dataset contains GeoJSON data for regions in China. It can be used to help display Coronavirus cases in China by region. The data was taken from the larger dataset TopoJSON on Github.
Clinical Characteristics of COVID-19 in China – Published in The New England Journal of Medicine, this paper includes data on COVID-19 in China and how the country responded to the outbreak.
Fatality Rate – A small dataset which shows the fatality rate of COVID-19 in China as of February 11, 2020. (STATISTA)
Deaths and Recovered Cases – This dataset can be downloaded in XLS or PPT format and includes the number of novel coronavirus infection, death and recovery cases in China by region. The dataset is updated regularly. (STATISTA)
Age distribution of COVID-19 patients – This graph shows the age distribution of Coronavirus patients in China as of February 11, 2020. (STATISTA)
Gender Distribution – A simple dataset showing the gender distribution of Coronavirus patients in China as of February 11, 2020. (STATISTA)
Germany Coronavirus Datasets
COVID-19 Germany – This dataset covers the number of Coronavirus cases reported in Germany. (STATISTA)
Italy COVID-19 Datasets
Cases by Region – This graph shows Coronavirus cases by region in Italy (STATISTA)
Deaths by Region – This graph shows Coronavirus deaths by region in Italy (STATISTA)
Japan COVID-19 Datasets
Japan Coronavirus Data – This dataset includes the number of patients diagnosed with coronavirus in Japan as of March 6, 2020, by place of infection. (STATISTA)
South Korea Coronavirus Datasets
COVID-19 (South Korea) – This dataset consists of information provided by the KCDC (Korea Centers for Disease Control & Prevention). It includes data about confirmed COVID-19 cases in South Korea and the following columns: gender, birth year, region, date of confirmation, and date of discharge, among other data points.
UPDATE: This dataset is being updated into the following format:
caseid / province / city / group / infectioncase / confirmed / latitude / longitude
patientid / globalnum / sex / birthyear / age / country / province / city / disease / infectioncase / infectionorder / infectedby / contactnumber / confirmeddate / releaseddate / deceaseddate / state
patientid / globalnum / date / province / city / visit / latitude / longitude
code / province / city / latitude / longitude
date / time / test / negative / confirmed / released / deceased
United States Coronavirus Datasets
US COVID-19 Daily Cases with Basemap – From Harvard University, this dataset contains daily COVID-19 with a United States basemap that includes state and county-level data.
California COVID-19 Hospital Data and Case Statistics – This California dataset includes information on total cases, deaths, positive and suspected positive COVID-19 patients, as well as the Intensive Care Unit positive and suspected positive COVID-19 patients. This dataset is updated daily.
COVID-19 U.S. – This dataset includes information on confirmed Coronavirus cases in the United States. (STATISTA)
Tracking Coronavirus: U.S. Data – From BNO News, this resource contains the map data and timeline information for COVID-19 cases in the United States.
We hope that this article helped you find the data you were looking for. If you were looking for a certain country’s data that wasn’t listed above, please view the datasets within the “Global” section at the beginning of the article.
Remember, this Coronavirus dataset list will be updated when new datasets become available. To keep up with all future updates and other machine learning guides and news, please subscribe to our newsletter.
For more Coronavirus resources, please check out: