What is Video Annotation?

Video annotation involves adding metadata to unlabeled video in order to train a machine learning algorithm. This metadata, also referred to as tags or labels, could be anything from a bounding box around a certain part of the image to full segmentation, where every pixel is annotated with its semantic meaning. Video annotation is used to train algorithms for a variety of tasks, from simple classification to the tracking of objects across multiple frames.

For all of these tasks, it’s crucial that each frame of video in your dataset is accurately labeled. A human-annotated video dataset can have a transformative impact on machine learning models in a range of industries, from autonomous vehicles to manufacturing. At Lionbridge, we’ve spent decades accumulating the experience and technology necessary to serve all of your video annotation needs. Whether your project requires simple or complex labeling, every dataset that we create has the accuracy to improve your machine learning model.

Why Lionbridge?

20 years of experience has made us an industry-leading provider of video annotation services.

Customizable Workflow

If you have specific requirements to fulfill or want to keep your data private, Lionbridge has a workflow to suit your needs.

Expert Annotators

We train every contributor to ensure that their labels are accurate and appropriate for your project.


Our quality assurance system combines technology with human experience to comprehensively analyze each of your data points.

1 million+ Contributors
300+ Languages
20+ Years of Experience

Our Video Annotation Services

2D and 3D Bounding Boxes

Our expert annotators draw 2D or 3D bounding boxes around objects in video for a wide range of use cases. Thanks to our custom-built workbench, we’re able to create multiple precise bounding boxes within every frame of your video. Our rigorous quality assessment also ensures accurate annotations.


Lionbridge provides you with the team, platform, and project managers you need to draw pixel perfect polygons around objects in your frames. Designed with your use case in mind, our platform has been built to annotate irregular objects with a high degree of accuracy.

Landmark Annotation

We offer fast, efficient keypoint annotation over multiple frames for a range of use cases, including facial recognition, emotion detection, and counting applications. Our platform has extensive capabilities that allow us to accurately label anatomical or structural points of interest in your data.

Semantic Segmentation

Lionbridge has experience building video datasets with pixelwise annotation for clients in a diverse range of industries. Our project managers draw on years of experience to ensure that each frame is pixel-accurate and conforms to your specifications.

How it Works

how to crowdsource data

1. Project set-up

Our team will work with you to develop a custom solution based on your project objectives and timeline.

how to crowdsource data
how to crowdsource data

2. Production

Our crowd of multilingual experts get to work evaluating and reviewing your advertisements.

how to crowdsource data
how to crowdsource data

3. Delivery

Our project management team check, package and format the data before being sent to you for final approval.

how to crowdsource data

Our Video Annotation Solutions

Object Localization

Lionbridge provides video classification and localization services for a range of project types. By combining our versatile tech platform with the experience of our project managers, we can design a custom workflow for bounding box, polygon or line annotation that suits your project’s specific requirements.

Object Detection

For any computer vision project, it’s essential that your dataset has accurate annotations for both multiple classes and multiple instances of each object in your classification system – often within the same frame of video. Lionbridge can establish a high level of accuracy across your entire dataset in a range of annotation types.

Video Tracking

Footage with frame by frame tracking is an important tool for anyone creating a machine learning algorithm in the autonomous vehicles sector. Lionbridge’s workflow ensures that annotated objects in the first frame are accurately followed through subsequent frames. Our keen eye for detail will ensure that every movement of your object is captured.


Video Annotation Pricing

The Lionbridge platform streamlines much of the process, allowing us to offer the most cost-effective video annotation solution in the industry. Contact us to get a free estimate for your project.

  • Account Manager
  • Project Management
  • 24/7 Support
  • API
  • NDA
  • Volume pricing
  • Custom reporting
  • Enterprise-grade SLAs
  • Custom invoicing
  • Consulting services
Get in touch with our team today

Multilingual Audio Transcription Services

Lionbridge provides custom audio transcription services in over 300 languages. Some of our most popular languages include:

  • Chinese audio transcription services
  • Dutch audio transcription services
  • French audio transcription services
  • German audio transcription services
  • Italian audio transcription services
  • Japanese audio transcription services
  • Portuguese audio transcription services
  • Spanish audio transcription services

Learn more about video annotation on our blog

Vahan Petrosyan is the co-founder of image annotation startup SuperAnnotate. In a conversation with us, he discussed recent developments in image annotation and how they will affect the demand for AI training data.
Humans and animals use their eyes to see the world around them; computer vision is the science that aims to give the similar skill to machines.
Moderating content is a complex task that goes far beyond the controversial work of vetting violent or extremist content. In this article, we talk about what content moderation involves and introduce a range of different ways that content is processed and filtered.