The Ultimate Dataset Library for Machine Learning

Article by Lucas Scott | January 17, 2020

At Lionbridge, we know that high quality training data can be difficult to find. To help students, data scientists, and development teams get the data they need, we’ve posted a large amount of dataset aggregations on our blog. Here, you can find all of those datasets in one convenient place and search for the data you need based on use case or data type. This list will be constantly updated, providing you with the best curated dataset library available online.

The datasets have been listed in alphabetical order according to use case. Some datasets have been repeated if they belong to multiple categories. 


Audio Datasets


Computer Vision Dataset Library


Data Analytics


Fintech and Financial Services Data


Language Dataset Library

language dataset aggregator




NLP Datasets



Social Media Datasets


Miscellaneous Datasets


This dataset library will be constantly updated with new curated lists of the best datasets for each category and use case. Subscribe to our newsletter to receive notifications for future updates and keep up with all the latest in machine learning. 


Lionbridge Data Annotation Services

Still can’t find the data you need for your project? Get in touch to learn more about our services. With over 20 years of experience in translation, linguistics, and AI training data, Lionbridge is trusted by governments and large tech companies worldwide. We are a leader in NLP data outsourcing, image annotation, and more. 

Interested? Get high-quality data now
The Author
Lucas Scott

Lucas is a seasoned writer, with a specialization in pop culture and tech. He spends most of his free time coaching high-school basketball, watching Netflix, and working on the next great American novel.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.