18 Best Datasets for Machine Learning Robotics

Article by Hengtee Lim | February 28, 2020

Robotics datasets are becoming more common thanks to ongoing developments in machine learning robotics sectors. In healthcare, robots designed to assist staff at busy hospitals are already in testing. In the industrial sector, a variety of robots are used to weld, clean, cut, and construct a whole host of different tools and objects. And in commercial areas, we’re seeing rapid developments in autonomous vehicles, both on the road and in the air.

Many datasets for machine learning in robotics are open source and available for anybody interested in researching and developing their own robotics solutions. However, the right datasets are not always easy to find, and scouring the internet for them takes time. To help, we’ve put together a list of 18 robotics datasets. It covers robot locomotion, computer vision, robot vehicles, and more.

We hope this list provides you with a solid starting point for learning more about the field, or for starting your own machine learning in robotics project!


Robotics Datasets

General Machine Learning Robotics Datasets

The University of Michigan Robotics Datasets: The UMR datasets page offers access to a wide variety of datasets. Their collection includes datasets for biped robotics, video, safety situational awareness, and leg joint kinematics, kinetics, and EMG activity. The datasets are courtesy of NeuRRo Lab, Roahm Lab, the Corso Research Group, and April Robotics Laboratory.

The MIT Robotics Dataset Repository: Known as Radish, this repository of datasets covers odometry, laser, sonar, and sensor data taken from real robots. You’ll also find environmental maps generated both by robots and by hand.

The Awesome Robotics Datasets: Courtesy of Github user Sunlok Choi, this massive repository covers a wide range of datasets broken into the following categories: dataset collections, place-specific datasets, topic-specific datasets, and topic-specific datasets for computer vision. The sheer size of this repository makes it a great starting point for projects related to machine learning in robotics.

RoboNet Large-Scale Multi-Robot Learning Dataset: This dataset, by Berkeley Artificial Intelligence Research, contains 15 million video frames from robots interacting with different objects in a table-top setting. The stated goal of the dataset is “…to pre-train reinforcement learning models on a sufficiently diverse dataset and then transfer knowledge to a different test environment.”

Robot Locomotion Datasets

The RoboTurk Real Robot Datset: This dataset by Stanford Vision and Learning is currently the largest dataset for robotic manipulation through remote teleoperation. The data was collected over one week with 54 operators, and includes 111 hours of robotic manipulation data on 3 challenging manipulation tasks. In particular, the data is helpful for tasks that require dexterous control and human planning.

Google Brain Robotics Data: This robot locomotion repository focuses on the actions of robotic arms. The available datasets include grasping, pushing, pouring, and depth image encoding. To support these datasets, you’ll also find a collection of procedurally generated random objects on which to train robot grasping and other tasks.

Dataset of Daily Interactive Manipulation: This robot locomotion dataset aims to teach robots daily interactive manipulations in changing environments. The dataset focuses on the position, orientation, force, and torque of objects manipulated in daily tasks.

Computer Vision Datasets

The Robot@Home Dataset: From the International Journal of Robotics Research, this computer vision dataset is for the semantic mapping of home environments. The dataset is a collection of raw and processed sensory data from domestic settings. It contains 87,000+ time stamped observations.

The Fukuoka Datasets for Place Categorization: The datasets here collect indoor and outdoor scenarios from locations in Fukuoka, Japan. It includes forests, urban areas, indoor parking, outdoor parking, coastal and residential areas, corridors, offices, labs, and kitchens.

The DTU Robot Image Datasets: These two datasets of random objects were generated with a unique, experimental setup. One dataset is for evaluating point features, and one is for evaluating multiple view stereo. Because the setup is designed to avoid light pollution, the process allows for large amounts of high-quality data.

Repository of Robotics and Computer Vision Datasets: The MRPT dataset collection is home to datasets from mobile robots, vehicles, and handheld sensors. The collection contains a range of datasets for urban scenarios and odometry, as well as open-source research from other labs.

New College Vision and Laser Dataset: This dataset was gathered while traveling through a college and its adjoining parks. It is intended for the mobile robotics and vision research communities, and for those interested in 6-DoF navigation and mapping.

Machine Learning Robotics Vehicle Datasets

The FMP Dataset: Created by the Ford Center for Autonomous Vehicles, this dataset contains the FCAV M-Air Pedestrian (FMP) dataset of monocular RGB images, and Planar LiDAR data for pedestrian detection. The dataset was collected in an outdoor environment at the University of Michigan campus.

Oxford Robotics Institute RobotCar Dataset: This dataset contains over 100 repetitions of a consistent route through Oxford, UK. It includes a huge variety of weather types, times of day, and roadworks and construction. The purpose of the dataset is to investigate long-term localisation and mapping for autonomous vehicles in real-world, dynamic urban environments

The Kitti Vision Benchmark Suite: This suite of datasets was captured by driving around the city of Karlsruhe. It includes both rural areas and highways, with up to 15 cars and 30 pedestrians visible per image.

Complex Urban Dataset: This data set provides LiDAR data from an urban environment, including complex buildings, residential areas, and metropolis areas. Its aim is to address the major issues of complex urban areas, such as unreliable and sporadic GPS data, multi-lane roads, complex building structures, and an abundance of highly dynamic objects.

MultiDrone Public Dataset: This dataset was assembled with both pre-existing material and newly filmed material. Much of it has been annotated for visual detection tasks such as tracking bicycles, football players, and human crowds. Other visual data includes boat races, environments, and buildings.

The Autonomous Space Robotics Lab Datasets: ASRL is a robotics lab researching space and terrestrial applications of mobile robots. Their current focus is on vision-based navigation, allowing mobile robots to drive in outdoor, unstructured environments. Datasets here include navigation datasets for lunar roving vehicles, and 3D mapping datasets to emulate planetary terrains.


Still can’t find the robotics dataset you’re looking for? Lionbridge provides a suite of services for machine learning in robotics including data collection, annotation, and validation for robotics projects. Our dedicated community of data scientists and annotators can help source or create the data to kickstart or enhance your project.

Get in touch about creating your own unique dataset
The Author
Hengtee Lim

Hengtee is a writer with the Lionbridge marketing team. An Australian who now calls Tokyo home, you will often find him crafting short stories in cafes and coffee shops around the city.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.