18 Websites to Download Free Datasets for Machine Learning Projects

Article by Rei Morikawa | January 16, 2019

The best way to learn machine learning is to practice with different projects.

This time for Lionbridge AI’s article series on open datasets for machine learning, we will introduce 18 websites to search and download free datasets online.

 

Download Free Datasets Online

  • Kaggle: Data science site that contains a variety of externally-contributed interesting datasets. You can find all kinds of niche datasets in its master list, from ramen ratings to basketball data, and even Seattle pet licenses.
  • UCI Machine Learning Repository: One of the oldest sources of datasets on the web, and a great first stop when looking for interesting datasets. Although the data sets are user-contributed, and thus have varying levels of cleanliness, the vast majority are clean. You can download data directly from the UCI Machine Learning repository, without registration.
  • Lionbridge AI: Lionbridge AI has compiled lists of open datasets that are available for free download online. We’ve sorted the best datasets by genre, such as audio datasets and cryptocurrency datasets. You can also order custom datasets for your unique machine learning projects.
  • FiveThirtyEight: Current affairs website that provides the public with the data used for its articles and infographics. It got its start as a polling aggregator solely focused on political topics but has since branched out to cover sports, societal matters, and more. See also the FiveThirtyEight GitHub.
  • Amazon Web Services: Amazon makes large datasets available on its Amazon Web Services platform. You can download the data and work with it on your own computer, or analyze the data in the cloud using EC2 and Hadoop.
  • r/datasets: Subreddit dedicated to sharing, finding, and discussing datasets with other Redditors.

 

Download Free Government and Demographic Datasets Online

  • Data.gov: The home of the U.S. Government’s open data. Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more.
  • Apertio: Apertio Technologies has built the industry’s first global database and search engine for open government data. The database covers over 2,000 open data sites and trillions of records worldwide.

 

Download Free Social Media Datasets Online

  • Social computing data repository: Datasets from multiple sources such as Twitter and YouTube, in varying sizes.
  • Stanford large network dataset collection (SNAP): Similar to the Social Computing Data Repository, SNAP also has a wide range of datasets of varying size, from different sources such as Facebook and Reddit, so you can find the one that best fits your project needs. In addition, SNAP is a library that allows for easy integration and analysis of large networks in general, including the SNAP datasets.
  • Network repository: Includes social networks, web graphs, bio and brain networks, etc. They also have interactive visual analytic tools to compare and explore the various social networks.

 

Download Free Finance & Economic Datasets Online

  • Quandl: Open source for economic and financial data, useful for building models to predict economic indicators or stock prices.
  • World Bank Open Data: Datasets covering population demographics and a huge number of economic and development indicators from across the world.
  • EU Open Data Portal: Open data published by EU institutions and agencies about the economy, as well as employment, science, environment, and education.
  • IMF Data: The International Monetary Fund publishes data on international finances, debt rates, foreign exchange reserves, commodity prices and investments.
  • American Economic Association (AEA): Open source for US macroeconomic data.
  • Eurostat Comext: Datasets on trade flows since 1988, organized by commodity.
  • CIA World Factbook: Economic stats of countries, as well as other stats on demographics, geography, communications, and military.

Still can’t find the data you need to train your model? Lionbridge AI provides custom AI training data in 300 languages for your specific machine learning project needs.

Contact us to learn more about how Lionbridge AI can work for you.

The Author
Rei Morikawa

Rei writes content for Lionbridge’s website, blog articles, and social media. Born and raised in Tokyo, but also studied abroad in the US. A huge people person, and passionate about long-distance running, traveling, and discovering new music on Spotify.

Welcome!

Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.