“It’s incredibly important to maintain a high level of data quality, otherwise it’s garbage in — garbage out. We used several internal tools to assess the quality of what Lionbridge AI provided, and the results were very good. We have confidence that we could potentially use the data for critical deep-learning projects.”
“Many underestimate the technical infrastructure and operational excellence needed to get such high quality training data. Services like Lionbridge AI are great for engineering teams who need data fast and at scale. Using the right service is crucial to ensure quality results and a high ROI.”
You're in good company
ASR System Development
For a multinational telecommunications company, our diverse crowd of project managers and linguists created multilingual language models, collected audio training data, and conducted in-market testing for automatic speech recognition software in 10 target languages.
Data Collection & Annotation
Our team collected over 5,000 unique conversations from thousands of global contributors in 16 target languages. Based on client specifications, all conversation data was annotated with speaker information, sentiment tags and more.
Social Media Ad Evaluation
For an ongoing, multi-year project with one of the world’s biggest social networking platforms, we review over 1 million social media ads per month. Our team recruits, educates and manages over 4,000 in-country raters across 10+ markets. The raters represent a large variety of different demographic groups.
Sentiment Analysis Data Collection
For a major multinational technology firm, we created over 10,000 unique sentences tagged by category and sentiment across 13 languages.
Proofing Tool Development
Working with our large team of computational linguists, linguists and data engineers, among others, we provide language development, quality assurance and sustained engineering services for the creation and enhancement of state-of-the-art grammar and spelling correction tools, as well as advanced text authoring components.
Speech Recognition Training
Our team of linguists performed pronunciation checks, transcription validation, and pronunciation generation for speech recognition software development.
Speech Data Collection
To improve speech recognition software, our team generated variants of multilingual voice commands to be recorded later by a team of native speakers.
Machine Translation Retraining
Using a team of native speakers in the Japanese to Chinese language pair, we helped fine-tune machine translation output for one of Japan’s largest telecommunications firms.
Chatbot Training Data
Our team of linguists helped train a chatbot to recognize and respond to a variety of native and non-native sentences for a leading virtual assistant software company.
Our language experts provided a human review of language, style and layout of large-volume paid advertisements for a major travel company.
We built a customized team and process to select the three most informative comments across hundreds of forum posts for a social news and discussion website.
A team of Arabic speakers labeled the sentiment of social media posts, classifying each piece of content as either positive, negative, or neutral.
OCR System Training
We collected samples of handwritten Japanese characters by native speakers to train an OCR engine to read handwritten documents.
Audio Dataset Creation
We created a richly detailed dataset of Japanese voice recordings that was transcribed by our crowd of native speakers.
Entity Extraction & Annotation
Our native speakers of Japanese reviewed thousands of short texts taken from articles and newspapers, classifying a range of named entities into five different categories.
Translation Corpus Licensing
We licensed 200,000+ segments in Japanese to English to train machine translation deep-learning models for top mobile messaging company.
Audio Speech Analysis
For a firm that required native speakers across multiple languages, our crowd evaluated hundreds of machine-generated speech samples. They analyzed pronunciation and flagged any errors to determine overall naturalness.
Translated Variations Against Intents
A team of our specialists translated a document of intents into German, before adding a variety of spoken queries.
Using a preferred pool of our English-speaking contributors aged 18-45, we created 10 unique queries for each of the customer’s intents. These encompassed both formal and casual language.
For one of our valued repeat clients, our specialists evaluated the quality of over 10,000 English language text-to-speech audio sets.
Machine Translation Quality Evaluation
Our language specialists rated and compared the quality of Chinese to English machine translations for a global ecommerce company.
For a leading Asian crowdsourcing company, our native speakers assessed the quality of machine-generated speech.
Eye Tracking Data Collection
We created a comprehensive dataset representing a range of ethnicities for a smartphone eye tracking company.
Data Collection & Classification
Our team collected, labeled, and categorized a large amount of data into 29 different categories.
Classification of Search Queries
We labeled over 150,000 search queries in English and Bahasa Indonesian with their intent for use in a context-based search engine.
Using a custom, multi-tier taxonomy, our contributors classified a dataset of thousands of companies according to their services.
We performed part-of-speech tagging on 40,000 short texts in a Southeast Asian language for a leading travel company.
Key Point Annotation
In images of people doing sports, our contributors annotated 17 different visible body parts to help the customer accurately analyze video frames.
Ad Relevance Evaluation
For an ongoing project for a leading technology company, thousands of evaluators across 10 global markets rate the relevancy of ads displayed during an online search.
Search Relevance Evaluation
To optimize global search results for a leading search engine, we’ve hired 100,000+ local contributors in over 100 markets to evaluate search queries for accuracy and relevancy.
For a major media company, content moderation tasks are distributed to hundreds of moderators in over 40 markets. Our moderators rate videos for relevance and flag any inappropriate or offensive content.
Geo-Local Data Evaluation
For a leading navigation app, we’ve hired thousands of in-country contributors to complete millions of search relevance and data verification tasks across 40+ global markets.
Entity Recognition for Chatbot Commands
For one of the world’s top technology companies, our contributors identified and labeled entities contained in thousands of voice commands to train a market-leading virtual assistant.
For a multinational mass media company that required around-the-clock support, we sourced a team of international contributors to complete data enrichment tasks for image and text documents in English and German. All tasks were completed within 24 hours according to client specifications.
Our team completed thousands of data enrichment tasks per week on CRM data from a multinational software corporation.