The growth of the AI industry has led to an increasing demand for data annotation services and the birth of more and more data annotation companies. Just what are annotation services and how do you use them to their full potential? This article will go over the types of annotation services, how to ensure good data annotation quality, and tips to help minimize annotation costs.
What are Annotation Services in Machine Learning?
Within the field of machine learning, annotation service providers are companies that annotate and process raw data, for the purpose of training AI models. Due to the large scale of data labelling tasks, annotation companies often employ crowdworkers to label the data and complete the project within the client’s timeframe.
Types of Annotation Services
You can use annotation services to complete various different tasks depending on the type of training data and the scope of your project. Most AI training data will be in the form of either images, video, audio, or text.
3 Ways to Improve Data Quality
Some studies have found that the quantity and quality of the training data is more important than the type of model chosen. The first step when annotating training data is to thoroughly vet annotators and put quality assurance protocols in place. However, quality does not simply mean accurate annotations. Data quality is also influenced by annotator bias and noise in the data.
To ensure high-quality data production throughout the annotation process, there are various QA protocols you can put in place.
1. Self-Agreement Tests
One way to test your annotators is through self-agreement testing. You can do this test by giving your annotator the same piece of data twice. However, you should not have them annotate the same data twice in a row. It is best to space them out far enough that the annotator may not realize they have seen the data before. In the end, if both annotations are the same, you know that the annotator is consistent in their work.
When self-agreement is low, it is an indicator of low-quality annotators. This might not necessarily mean that they are poor workers. The cause could be a lack of appropriate training, or annotation tasks that are highly subjective. In any case, if you observe low self-agreement, you must investigate the case promptly and find the root cause before proceeding with further annotation.
2. Inter-Annotator Agreement Tests
Another way to measure quality throughout the annotation process is to test for inter-annotator agreement through multi-pass annotation. Multi-pass annotation is where you have two or more annotators annotate the same piece of data. If the annotations are the same, the annotation is considered accurate.
If the annotators disagree, you can accept majority rulings. For example, if 2 annotators label the image as a dog and 1 annotator says it’s a cat, the system would automatically veto the cat label and accept the dog label chosen by the majority.
Alternatively, you can also employ adjudication in cases where annotators disagree. With adjudication, a specialist or supervisor is brought in to look at the data and decide which annotation is correct.
Low Inter-Annotator Agreement
If agreement is low, it is an indicator of the objective difficulty of the task. For example, you give three annotators 100 images of dogs and cats and task them with classifying each image as either dog or cat. The task is fairly simple and you should expect high inter-annotator agreement.
On the other hand, imagine you give those same annotators 100 social media posts and ask them to classify each post as positive, negative, or neutral. This task is much more subjective than identifying cats and dogs. Therefore, while self-agreement is more reflective of individual annotator quality, inter-annotator agreement is a reflection of task difficulty and subjectivity.
What are Acceptable Levels of Self-Agreement and Inter-Annotator Agreement?
Acceptable levels of annotator agreement depends on the specific annotation task you are giving them. If the task straightforward, e.g. drawing bounding boxes around the cars in each picture, you may not need to monitor inter-annotator agreement at all.
However, subjective tasks like sentiment analysis rely heavily on annotator agreement. As a result, studies in the field have produced benchmark annotator agreement levels that could be used as a rule-of-thumb. In a 2016 paper on Twitter sentiment analysis, Mozetič et al. state that for self-agreement, Alpha > 0.6 (over 60% agreement) is desirable. For inter-annotator agreement, Alpha > 0.4 (over 40% agreement) is desirable.
3. Add Data Noise
When most people think of noise in data, they immediately search for ways of reducing it. However, good training data is representative of the natural environment the final product will be used in. If you train a voice assistant on human speech data recorded in a sound booth, that model will likely be strong at recognizing voices in a quiet space absent of background noise.
However, people don’t always use voice assistants in quiet places. People use them, for example, to find the nearest cafe while walking down the street. Meanwhile, cars are honking and people are talking in the background. To simulate natural environments, you may want to add noise to your training data. Data noise could allow your model to generalize better across various use cases.
Annotation Services Pricing: 3 Tips to Reduce Costs
The most important question you may be asking yourself is how much is this going to cost me? The price depends on a variety of factors. However, using the information above and Lionbridge as an example, these three tips could help you decrease your overall costs.
1. Use Fewer Annotators on Each Piece of Data
At Lionbridge, we tally up costs based on the amount of tasks you require from our annotators. For example, you have 1,000 images that need to be labeled, but you want 3 people to label each image. 1000 images x 3 annotators = 3,000 annotation tasks.
However, if you determine that the task isn’t that subjective in nature, maybe 2 annotators would suffice. 1000 images x 2 annotators = 2,000 annotation tasks. As a result, you would reduce costs per image by about 33% just by removing a single annotator. Please note that reducing the number of annotators can have a drastic impact on the quality of the data. It’s important to seriously consider the type of data and the scale of your project before deciding on how many annotators you need.
2. Use the Service Provider’s Annotation Platform
Many data annotation providers have their own annotation platforms. At Lionbridge, we provide the platform, the crowdworkers, and project management staff. Using our platform makes the whole annotation process run smoothly – from worker onboarding to project delivery.
A good annotation tool will minimize human involvement and maximize efficiency, while employing QA protocols during the annotation process itself.
You can request that annotations be done on your custom platform. However, service providers would need to adjust their workflows and worker onboarding processes. Therefore, using external platforms often increases costs and lowers efficiency by adding an additional training phase.
3. License the Annotation Platform and Use Internal Annotators
A large portion of annotation service costs is used to pay the staff that labels the data. If you have an annotation project that is on the smaller side, one option is to use your internal staff to do the annotations.
At Lionbridge, you can license our all-in-one annotation platform and invite your own annotators into the system to annotate data. Licensing an annotation platform and doing the annotation work internally can drastically reduce costs.
We hope this guide helped strengthen your understanding of annotation services. Using the information above, you should be able to both monitor data quality and minimize annotation costs. Have more questions about data annotation? Get in touch with our team today.