Top 10 Vehicle and Cars Datasets for Machine Learning

Article by Limarc Ambalina | July 11, 2019

With the rise of Tesla’s self-driving cars and projects like Google’s Waymo, the autonomous vehicle industry seems to only be growing year after year. Autonomous vehicles are a high-interest area of computer vision with numerous applications and a large potential for profit. As with most computer vision algorithms, autonomous vehicles require a cornucopia of image data to train. It is often difficult to gain access to high-quality cars datasets or find a reputable image annotation service. It’s even more arduous and sometimes inefficient to manually annotate thousands of images yourself.


Vehicle and Cars Datasets

Below is a list of 10 open image and video datasets great for use in autonomous vehicle research and development. The datasets below consist of over 250,000 images and still video frames, some of which are already annotated.


1. BIT Vehicle Dataset – From the Beijing Laboratory of Intelligent Information Technology, this dataset includes 9,850 vehicle images. The images are divided into the following six categories by vehicle type: bus, microbus, minivan, sedan, SUV, and truck.

2. Cityscapes Image Pairs – Using traffic videos shot from vehicles driven in Germany, this dataset includes 2,975 image pairs. Each individual image file has the original still frame on the left and the same frame semantically segmented on the right.

250,000 Cars - Top 10 Free Image and Video Traffic Datasets for Machine Learning
Sample image from the Cityscapes Image Pairs Dataset

3. GTI Vehicle Image Database – This dataset includes 3,425 rear-angle images of vehicles on the road, as well as 3,900 images of roads absent of any vehicles.

4. KITTI Object Detection with Bounding Boxes – Taken from the benchmark suite from the Karlsruhe Institute of Technology, this dataset consists of images from the object detection section of that suite. This image dataset includes over 14,000 images made up of 7,518 testing images and 7,481 training images with bounding boxes labels in a separate file.

250,000 Cars - Top 10 Free Image and Video Traffic Datasets for Machine Learning
Sample image from the KITTI Object Detection Dataset

5. LISA Traffic Light Dataset – While this dataset does not focus on vehicles, it is still a very useful image dataset for training autonomous vehicle algorithms. The LISA Traffic Light Dataset includes both nighttime and daytime videos totaling 43,0007 frames which include 113,888 annotated traffic lights. The focus of this dataset is traffic lights. However, almost all the frames have both traffic lights and vehicles within them.

6. Nepalese Vehicles – Consisting of a total of 30 traffic videos taken in the streets of Kathmandu, this dataset includes images of 4,800 vehicles cropped from those videos. Of the 4,800 images, 1,811 are of two-wheeled vehicles and 2,989 are of four-wheeled vehicles.

7. Rain and Snow Traffic Surveillance – This dataset consists of 22 videos each around five minutes. The videos were captured using both an RGB color camera and an infrared thermal camera. Therefore, the data includes over 130,000 RGB-thermal image pairs.

8. Stanford Cars Dataset – From the Stanford AI Laboratory, this dataset includes 16,185 images with 196 different classes of cars.

9. Semantic Segmentation for Self Driving Cars – Created as part of the Lyft Udacity Challenge, this dataset includes 5,000 images and corresponding semantic segmentation labels.

10. TME Motorway Dataset – Composed of 28 video clips which amount to 27 minutes of video, this dataset includes over 30,000 frames with vehicle annotation.

Still can’t find cars datasets or vehicle datasets you need to train your model? Lionbridge AI is a global provider of custom AI training data for some of the world’s largest tech companies. Get in touch and utilize our large crowd of multilingual expert annotators to provide you with high-quality image and video data for your project.


Multilingual Image and Video Annotation Services

Lionbridge provides professional image and video annotation services for a variety of use cases.

Some of our most popular services include:

  • 2D Bounding Boxes
  • 3D Cuboids
  • Polygons
  • Lines and Splines
  • Landmark / Keypoint Annotation
  • Semantic Segmentation
  • Pixel-wise Segmentation
  • Image Classification
Interested? Get high-quality data now
The Author
Limarc Ambalina

Limarc writes content for Lionbridge’s website as part of the marketing team. Born and raised in Canada, Limarc’s love of Japanese pop culture brought him to Japan in 2016 and living in Japan has been his dream come true. Apart from Lionbridge content, you can catch Limarc online writing about anime, video games, and other nerd culture.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.