Top 10 Image Classification Datasets for Machine Learning

Article by Lucas Scott | December 18, 2019

To help you build object recognition models, scene recognition models, and more, we’ve compiled a list of the best image classification datasets. These datasets vary in scope and magnitude and can suit a variety of use cases. Furthermore, the datasets have been divided into the following categories: medical imaging, agriculture & scene recognition, and others. 


Medical Image Classification Datasets

Medical Image Classification dataset

1. Recursion Cellular Image Classification – This data comes from the Recursion 2019 challenge. This goal of the competition was to use biological microscopy data to develop a model that identifies replicates. The full information regarding the competition can be found here

2. TensorFlow patch_camelyon Medical Images – This medical image classification dataset comes from the TensorFlow website. It contains just over 327,000 color images, each 96 x 96 pixels. The images are histopathological lymph node scans which contain metastatic tissue.


Agriculture and Scene Datasets

Agriculture Image Datasets

3. CoastSat Image Classification Dataset – Used for an open-source shoreline mapping tool, this dataset includes aerial images taken from satellites. The dataset also includes meta data pertaining to the labels. 

4. Images for Weather Recognition Used for multi-class weather recognition, this dataset is a collection of 1125 images divided into four categories. The image categories are sunrise, shine, rain, and cloudy. 

5. Indoor Scenes Images – From MIT, this dataset contains over 15,000 images of indoor locations. The dataset was originally built to tackle the problem of indoor scene recognition. All images are in JPEG format and have been divided into 67 categories. The number of images per category vary. However, there are at least 100 images for each category. 

6. Intel Image Classification – Created by Intel for an image classification contest, this expansive image dataset contains approximately 25,000 images. Furthermore, the images are divided into the following categories: buildings, forest, glacier, mountain, sea, and street. The dataset has been divided into folders for training, testing, and prediction. The training folder includes around 14,000 images and the testing folder has around 3,000 images. Finally, the prediction folder includes around 7,000 images. 

7. TensorFlow Sun397 Image Classification Dataset – Another dataset from Tensorflow, this dataset contains over 108,000 images used in the Scene Understanding (SUN) benchmark. Furthermore, the images have been divided into 397 categories. The exact amount of images in each category varies. However, there are at least 100 images in each of the various scene and object categories. 


Other Image Classification Datasets

8. Architectural Heritage Elements – This dataset was created to train models that could classify architectural images, based on cultural heritage. It contains over 10,000 images divided into 10 categories. The categories are: altar, apse, bell tower, column, dome (inner), dome (outer), flying buttress, gargoyle, stained glass, and vault. 

9. Image Classification: People and Food – This dataset comes in CSV format and consists of images of people eating food. Human annotators classified the images by gender and age. The CSV file includes 587 rows of data with URLs linking to each image.   

10. Images of Cracks in Concrete for Classification – From Mendeley, this dataset includes 40,000 images of concrete. Each image is 227 x 227 pixels, with half of the images including concrete with cracks and half without. 


Image Classification Services

We hope that the datasets above helped you get the training data you need. If you’re project requires more specialized training data, we can help you annotate or build your own custom image datasets. Check out our services for image classification, or contact our team to learn more about how we can help. 

Learn more about our image classification services
The Author
Lucas Scott

Lucas is a seasoned writer, with a specialization in pop culture and tech. He spends most of his free time coaching high-school basketball, watching Netflix, and working on the next great American novel.


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.