Create an End to End Object Detection Pipeline using Yolov5

Article by Rahul Agarwal | June 22, 2020

Ultralytics recently launched YOLOv5 amid controversy surrounding its name. For context, the first three versions of YOLO (You Only Look Once) were created by Joseph Redmon. Following this, Alexey Bochkovskiy created YOLOv4 on darknet, which boasted higher Average Precision (AP) and faster results than previous iterations.

Now, Ultralytics has released YOLOv5, with comparable AP and faster inference times than YOLOv4. This has left many asking: is a new version warranted given similar accuracy to YOLOv4? Whatever the answer may be, it’s definitely a sign of how quickly the detection community is evolving.

Source: Ultralytics YOLOv5 (

Since they first ported YOLOv3, Ultralytics has made it very simple to create and deploy models using Pytorch, so I was eager to try out YOLOv5. As it turns out, Ultralytics has further simplified the process, and the results speak for themselves.

In this article, we’ll create a detection model using YOLOv5, from creating our dataset and annotating it to training and inferencing using their remarkable library. This post focuses on the implementation of YOLOv5, including:

  • Creating a toy dataset
  • Annotating the image data
  • Creating the project structure
  • Training YOLOv5
  • Prediction


Creating a Custom Dataset

As I don’t have an image dataset to work with, I will download data from the Open Image Dataset (OID). This is an excellent resource for annotated image data that can be used for both classification and detection. However, instead of using the OID’s provided annotations, we’ll be creating our own for the sake of learning.


1. Download Images from OIDv4:

To download images from the Open Image dataset, we start by cloning the OIDv4_ToolKit and installing all requirements.

git clone
cd OIDv4_ToolKit
pip install -r requirements.txt

We can now use the script within this folder to download images and labels for multiple classes.

We’ll download the data for Cricketball and Football images to create our custom dataset. This is for creating a dataset of footballs and cricket balls, where our learning task will be to detect these objects.

python3 downloader --classes Cricket_ball Football --type_csv all -y --limit 500

The above command creates a directory named “OID” with the following structure:

The OID directory structure. We will take only the image files (.jpg) from here and not the labels because we will manually annotate our custom dataset. However, we can use these labels for different projects if necessary.

Before moving on, we need to copy all the images in the same folder to start our data labeling. This can be done manually, or programmatically using the recursive glob function:

import os
from glob import glob

os.system("mkdir Images")
images = glob(r'OID/**/*.jpg', recursive=True)
for img in images:
    os.system(f"cp {img} Images/")


2. Label Image Data

We will use an open source tool for our data labeling. This will let us create a project, set up labels for it, and then we can jump into the annotation as you can see below.

Top to bottom, left to right: 1. Create Project, 2, Set up Labels, 3. Add Local Image data source, 4. Annotate

We’ll export the data in the YOLO format when we are finished annotating. If necessary, you can also get your annotations in JSON format (COCO) or XML format (Pascal VOC).

5. Export

Exporting in the YOLO format creates a .txt file for each of our images, containing the class_id, x_center, y_center, width, and the height of the image. It also creates a file named obj.names, which helps map the class_id to the class name. I’ve put an example below:

The image
The annotation
The obj.names file

Notice that the coordinates in the annotation file are scaled from 0 to 1. Also, note that the class_id is 0 for Cricketball and 1 for Football as per the obj.names file, which starts from 0. There are a few other files we create using this export, but we won’t be using them in this example.

Once the export is complete, we only need to rearrange some of our files for subsequent training and validation splits during the model training.

The dataset will now be a single folder containing both the images and the annotations, as we can see below:

    - 0027773a6d54b960.jpg
    - 0027773a6d54b960.txt
    - 2bded1f9cb587843.jpg
    - 2bded1f9cb587843.txt


Setting up the Project

To train our custom object detector, we will use YOLOv5 from Ultralytics. To start, we clone the repository and install the dependencies:

git clone # clone repo
cd yolov5
pip install -U -r requirements.txt

Next, we create our own folder named “training” in which we keep our custom dataset.

!mkdir training

We’ll copy our custom dataset folder into this folder and create the train/validation folders using the simple train_val_folder_split.ipynb notebook. The code below will create some training and validation folders and populate them with images.

import glob, os
import random

# put your own path here

dataset_path = 'dataset'

# Percentage of images to be used for the validation set

percentage_test = 20

!mkdir data

!mkdir data/images
!mkdir data/labels
!mkdir data/images/train
!mkdir data/images/valid
!mkdir data/labels/train
!mkdir data/labels/valid

# Populate the folders

p = percentage_test/100
for pathAndFilename in glob.iglob(os.path.join(dataset_path, "*.jpg")):
    title, ext = os.path.splitext(os.path.basename(pathAndFilename))
    if random.random() <=p :
        os.system(f"cp {dataset_path}/{title}.jpg data/images/valid")
        os.system(f"cp {dataset_path}/{title}.txt data/labels/valid")
        os.system(f"cp {dataset_path}/{title}.jpg data/images/train")
        os.system(f"cp {dataset_path}/{title}.txt data/labels/train")

After running this, your data folder structure should look like the image below, with two directories for images and labels.

We now have to add two configuration files to the training folder:

1. Dataset.yaml: The “dataset.yaml” file contains the path of training and validation images and also the classes.

# train and val datasets (image directory or *.txt file with image paths)
train: training/data/images/train/
val: training/data/images/valid/

# number of classes
nc: 2

# class names

names: ['Cricketball', 'Football']


2. Model.yaml: We can use multiple models ranging from small to large while creating our network. For example, the yolov5s.yaml file in the yolov5/models directory is the small YOLO model with 7M parameters, while the yolov5x.yaml is the largest YOLO model with 96M parameters. For this project, I will use the yolov5l.yaml file, which has 50M parameters. To do this, we’ll copy the file from yolov5/models/yolov5l.yaml to the training folder and change nc (which is the number of classes) to 2, as per our project requirements.

# parameters
nc: 2 # change number of classes
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple



At this point our training folder looks like this: 

Once we are done with the above steps, we can start training our model. This is as simple as running the below command, where we provide the locations of our config files and various other parameters. You can check out other options in the file, but these are the ones I found noteworthy.

# Train yolov5l on custom dataset for 300 epochs
$ python --img 640 --batch 16 --epochs 300--data training/dataset.yaml --cfg training/yolov5l.yaml --weights ''

Sometimes you might get an error with the above command in PyTorch’s latest version (1.5) due to a problem with multiple GPUs. in that case, you can choose to run on a single GPU using:

# Train yolov5l on custom dataset for 300 epochs
$ python --img 640 --batch 16 --epochs 300--data training/dataset.yaml --cfg training/yolov5l.yaml --weights '' --device 0

Once training starts, you can confirm that it has been set up by checking the automatically created train_batch0.jpg file, which contains the training labels for the first batch. You can also check test_batch0_gt.jpg, which includes the ground truth for test images. This is how they look for me:

Left: train_batch0.jpg, Right: test_batch0_gt.jpg



To see the results for the training at localhost:6006 in your browser using tensorboard, run this command in another terminal tab:

tensorboard --logdir=runs

Here are the various validation metrics. These metrics are also saved in the results.png file at the end of the training run.



Ultralytics YOLOv5 provides a large variety of ways to check the results on new data.

To detect images, simply put them in the folder named inference/images and run the inference using the best weights as per the validation AP:

python --weights weights/


You can also do this in a video using the file:

python --weights weights/ --source inference/videos/messi.mp4 --view-img --output inference/output

Here I specify that I want to see the output using the — view-img flag, and we store the output at the location inference/output. This will create a .mp4 file in this location. It’s impressive that the network can see the ball, the speed at which inference is made, and also the mindblowing accuracy on data observed for the first time.

You can also use the webcam as a source by specifying the --source as 0. You can check out the various other options in file.



In this post, I showed how to create a YOLOv5 object detection model using a custom dataset. I love the way Ultralytics has made it so easy to create an object detection model. In addition, they have also provided a variety of ways to see the model’s results.

If you would like to experiment with the custom dataset made in this article, you can download the annotated data on Kaggle and the code at Github.

If you want to know more about object detection techniques, motion estimation, and object tracking in video, I recommend this course on Deep Learning in Computer Vision. You can also learn more about the evolution of the object detection field here.

This site is regularly updated with technical guides on how to implement machine learning systems. You can find a selection in the related resources below. Be sure to also check out my guides to text classification and image classification for further reading!

Subscribe to our newsletter for more technical articles
The Author
Rahul Agarwal

Rahul is a data scientist currently working with Facebook. He enjoys working with data-intensive problems and is constantly in search of new ideas to work on. Contact him on Twitter: @MLWhiz


    Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.