Datasets for Machine Learning

All geographic information systems rely on a large foundation of structured geospatial data. To help, we at Lionbridge have curated a list of the 15 best publicly available geographic data sources for machine learning.
A curated list of bounding box image datasets. If you’re looking for annotated image or video data, the datasets on this list include images and videos tagged with bounding boxes for a variety of use cases.
Because finding enough relevant datas in Korean is difficult, we at Lionbridge have put together a comprehensive list of public Korean datasets for machine learning.
From sentiment analysis models to content moderation models and other NLP use cases, Twitter data can be used to train various machine learning algorithms. 
Developing Russian NLP systems remains a big challenge for researchers and companies alike. To help, we at Lionbridge AI have put together an exhaustive list of the best Russian datasets available on the web, covering everything from social media to natural speech.
We at Lionbridge have compiled a list of 14 movie datasets. Many of the datasets on this list contain data points such as the cast and crew members, script, run time, and reviews. You could use these movie datasets for machine learning projects in natural language processing, sentiment analysis, and more.
Page of 8 Next

Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.