Article by Limarc Ambalina | August 09, 2019

Looking for image datasets for computer vision? The quest for high-quality data is an arduous one which data scientists face with each and every project. Image annotation can be a difficult task. Choosing, hiring, and training annotators to tag your data for you can be time-consuming and costly.

Depending on the size and scope of your project, it may be possible for you to source your training data from open datasets. If you’re looking for annotated image or video data, the datasets on this list include images and videos already annotated with bounding boxes. Find the bounding box image dataset or bounding box video dataset you’re looking for below.


What are the Best Bounding Box Image and Video Datasets for Machine Learning?

Some of the images or videos in the datasets below contain a single annotated object, and others contain multiple objects annotated within the same image or video frame. The datasets have been separated into the following categories: animals, medical, vehicles, and miscellaneous.

Note: All dataset links are listed at the bottom of the article in order of their appearance.


Animal Image and Video Datasets for Computer Vision

1. Cat and Dog Breeds – Funded by the UK India Education and Research Initiative, this bounding box image dataset includes images of 37 different breeds of cats and dogs. There are about 200 images for each class and all images include an annotation for the species and breed name, a bounding box around the animal’s head, and a pixel-level segmentation of the foreground and background of the image.

2. Sea Animals Video Dataset – From the Aalborg University, this bounding box video dataset contains 89 videos with bounding box annotations of sea animals in the following six categories: fish, small fish, crab, shrimp, jellyfish, and starfish.

3. Stanford Dogs Dataset – This image dataset has over 20,000 images of 120 different dog breeds. The images have been annotated with both class labels and bounding boxes.


Faces and People Bounding Box Image Datasets for Machine Learning

4. Face Detection in Images – This open source face image dataset includes over 500 images with over 1000 faces manually annotated with bounding boxes.

5. Age, Emotion, and Ethnicity Face Images Dataset – With over 1800 images, this bounding box image dataset includes full-body, partial-body, and face images of multiple people taken from various angles. A small portion of the images has been annotated with bounding boxes and labels for age, ethnicity, gender, and emotion.

6. CelebFaces Attributes – This bounding box image dataset for machine learning includes over 200,000 face images of celebrities. The data has been thoroughly annotated with bounding box annotations, landmark annotations, and attribute labels.


Medical Bounding Box Image Datasets for Computer Vision

7. Dendritic Spines – From researcher Michael Smirnov, this medical image dataset contains images of dendritic spines from visual cortex, Purkinje, and hippocampus cells all annotated with bounding boxes.

8. NIH Chest X-rays – From the National Institutes of Health, this is a large-scale medical imagery dataset which includes over 112,000 chest x-ray images. The images come from over 30,000 different patients and are annotated with bounding boxes around the diseased regions, as well as classified based on the disease.

9. NIH DeepLesion – Also from the National Institutes of Health, this medical bounding box image dataset contains over 32,000 CT slices from over 10,000 CT scans of 4,427 different patients. Each image contains between 1 and 3 lesions with bounding boxes drawn around the lesions.

Medical Computer Vision Datasets
Sample images from the NIH DeepLesion medical image dataset


10. Malaria Cells – From Broad Institute, this medical image dataset contains 1,364 images which include around 80,000 cells. The data includes two classes of uninfected cells (RBCs and leukocytes)  and four classes of infected cells (gametocytes, rings, trophozoites, and schizonts). Both bounding box coordinates and class labels are included for each cell.


Vehicle Image and Video Datasets for Machine Learning

11. KITTI Vehicle and Pedestrian Detection – From the KITTI Vision Benchmark Suite, this object detection dataset includes over 7,400 training images. The images contain pedestrians and vehicles which have been annotated manually with 3D cuboids.

12. Indian License Plate Detection – This bounding box image dataset includes images of 353 vehicles in India, with 229 of the images annotated with bounding boxes drawn around the license plate in each image.

Image Datasets for Computer Vision License Plates

13. LISA Traffic Light Dataset – With video collected in San Diego, California, this bounding box video dataset has over 23 minutes of driving video totaling over 43,000 frames. Within the frames, 113,888 traffic lights have been annotated with bounding boxes.


Miscellaneous Image and Video Datasets for Computer Vision

14. Accessories and Clothing for E-commerce – This bounding box image dataset includes over 900 images of clothing and accessories from ecommerce sites. 504 items have been manually labeled around the items with an item class label from one of the following: jackets, jeans, shirts, shoes, skirts, sunglasses, tops, trousers, tshirts.

15. Google Open Images Dataset V5 – This is by far the largest image dataset on this list, and perhaps one of the largest annotated image datasets in existence. This bounding box image dataset from Google includes over 478,000 crowdsourced images. The images have been annotated with bounding boxes, instance segmentation, image-level labels, and relationship annotations. This dataset can’t be placed into any one category because it includes images with subjects spanning over 6,000 categories. The dataset can be explored based on image categories or types of annotation.

16. Manga109 Character Faces and Japanese Text – From the Aizawa Yamasaki Laboratory at the University of Tokyo, the Manga109 image dataset is a compilation of 109 manga titles. Every page of the 109 manga has been annotated with bounding boxes both around character faces and Japanese text. This dataset can double as both a bounding box face image dataset and Japanese language detection dataset.

Image Datasets for Computer Vision: Manga Screenshot
Via manga109.org


17. Multi Salient Objects – This open image dataset includes over 1200 images. Each image is labeled with the number of salient objects in the image and includes the bounding box information for each image.

18. PASCAL Visual Object Classes – This dataset for computer vision was made for the PASCAL Visual Object classes challenge of 2012 and includes images with bounding boxes drawn around each of the target class objects in each image. The object classes include person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, and tv/monitor.

19. Street View House Numbers – With images taken of house numbers in the real world, the Street View House Numbers Dataset was built for object recognition algorithms. The images have been obtained from house numbers in images from Google Street View. This image dataset includes over 600,000 images with bounding boxes drawn around the house numbers.

20. Youtube-Bounding Boxes – One of the biggest datasets on this list, Youtube Bounding Boxes is a large-scale video bounding box dataset. All 240,000 videos have been manually annotated with over 5.6 million bounding boxes around 23 different types of objects. Google boasts a 95 percent labeling accuracy with this bounding box video dataset.


