Datasets for Machine Learning

From sentiment analysis models to content moderation models and other NLP use cases, Twitter data can be used to train various machine learning algorithms.
We have compiled a list of the 16 best crime datasets made available for public use. The datasets come from various locations and most of the data covers large time periods. 
Because finding enough relevant datas in Korean is difficult, we at Lionbridge have put together a comprehensive list of public Korean datasets for machine learning.
We at Lionbridge have compiled a list of high-quality Portuguese datasets that covers a wide spectrum of AI use cases, from speech recognition to machine translation.
All geographic information systems rely on a large foundation of structured geospatial data. To help, we at Lionbridge have curated a list of the 15 best publicly available geographic data sources for machine learning.
A curated list of bounding box image datasets. If you’re looking for annotated image or video data, the datasets on this list include images and videos tagged with bounding boxes for a variety of use cases.
Page of 8 Next


Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.