Content moderation is the protective barrier that makes social media platforms a safe place for everyone. It’s the gatekeeping mechanism tech giants like Google, Facebook and Twitter employ to control the billions of user-generated posts uploaded to their platforms each week. Much of the process is supported by “community operations” teams made up of thousands of in-house moderators who manually review posts that are flagged for inappropriate content.
Now, more and more companies are investing in machine learning technology to help moderate existing content and prevent offensive content from appearing in the first place. Well trained models can effectively reduce dependency on human moderators as well as speed up the overall moderation process.
However, in order to detect things like hate speech, bad words, and offensive content, these algorithms need to digest vast amounts of training data. To help you get started with building your own content moderation system, we at Lionbridge have put together the best open-source content moderation datasets for machine learning.
**Warning: These links contain material that many will find offensive. (But that’s the point!)
General Content Moderation Datasets for Machine Learning
NSFW Data Scraper: A set of scripts that allows for an automatic collection of 10,000s of sexually explicit images to be later used for training an image classifier.
Bad Bad Words: Contains a lot of bad words. The purpose of this dataset was to support the Toxic Comment Classification Competition and to create a model detecting language toxicity levels.
Hateful Users on Twitter: Contains a network of 100k users, out of which ~5k were annotated as hateful or not. For each user, several content-related, network-related and activity related features are provided.
Twitter Sentiment Analysis: The objective of this task is to detect hate speech (racist, sexist sentiment) in tweets. Full tweet texts are provided with their labels for training data.
Hate Speech Dataset from a White Supremacy Forum: Text extracted from Stormfront, a white supremacist forum. A random set of forums posts have been sampled from several subforums and split into sentences. Those sentences have been manually labelled as containing hate speech or not, according to certain annotation guidelines.
Offensive/Profane Word List: A list of 1,300+ English terms that could be found offensive. The list contains some words that many people won’t find offensive, but it’s a good start for anybody wanting to block offensive or profane terms on their Site.
Big List of Naughty Strings: An evolving list of strings which have a high probability of causing issues when used as user-input data. This is intended for use in helping both automated and manual QA testing; useful for whenever your QA engineer walks into a bar.
Multilingual Content Moderation Datasets
Hatebase: Built to assist companies, NGOs, research organizations and government agencies, NGOs and research organizations moderate online conversations, Hatebase is the world’s largest structured repository multilingual hate speech.
Hate speech detection in the Indonesian language: The dataset contains labeled tweets, consist of 713 tweets in the Indonesian language.
Dirty Naughty Obscene and Otherwise Bad Words: A repository filled with bad words across 25 languages (including Klingon and Esperanto!) used to filter out bad results.
Keep in mind that not all of these will work for every content moderation model. You might have some of your own custom tags or formatting requirements based on your system and target audience. To do that, you’ll need tailored data annotation services to create the custom data you need. Need a hand? With a network of over 500,000 qualified raters, linguists, and annotators, Lionbridge AI provides large scale content moderation services in over 300 languages.