What is Content Moderation?

Article by Meiryum Ali | June 13, 2018

More than 100,000 people globally moderate content. But if you skim the headlines on content moderation, the picture is pretty grim. Tech giants employ thousands of content moderators, but most of the jobs are staffed through temporary employment agencies and have a high turnover rate due to the difficult and stressful nature of the work. Separately, the rise of AI looms over the industry, threatening to take over from humans.

Take Facebook as an example. It employs over 7,500 content moderators who have to apply the company’s rules to all the content on Facebook. As the Guardian reports, this is a deeply stressful job: [1]

 

‘[which] involves being exposed to the most graphic and extreme content on the internet, making quick judgments about whether a certain symbol is linked to a terrorist group or whether a nude drawing is satirical, educational, or merely salacious.’

 

The same article also noted how an ACLU director expressed reservations about Facebook’s reliance on AI, saying: ‘AI will not solve these problems [of poor content moderation]…it will likely exacerbate them.’

So is the director right? Not exactly. The rise of content moderation is just beginning, and will likely evolve into a somewhat healthier relationship between human moderators and AI tools, as this article will explore.

But first, a primer. Exactly what is content moderation?

At its core, content moderation is basically data labeling. You are labeling or flagging data that does not meet guidelines that you find acceptable. That data could be in any form, such as articles, images, videos or audio clips.

Moderators have to constantly screen, monitor, and approve content in compliance with a company’s guidelines. They ensure that all information and data uploaded by users won’t violate intellectual property rights or contain inappropriate content. In some cases, they are also responding to feedback or posts on a site. By ‘protecting’ users from unseemly content, moderators actually help drive up higher user engagement. It also helps ‘protect’ a company’s online reputation.

There are 5 different types of moderation:

  • Pre moderation: checking content before it is published on a site
  • Post moderation: checking content after it is published
  • Reactive moderation: members on a site flagging content as it appears
  • Distributed moderation: members rating content on a site
  • Automated moderation:  the use of tech to reject or approve submissions to a site.

Of these, the last is the only one where AI takes over from humans in content moderation.

 

Data Labeling

Outsourcing content moderation to AI has taken off because of the general rise of (human) data labelers. Data labeling is the curation and organization of data to train machines. Labelers first comb through data, labeling what’s appropriate and what’s not. That way, the machine is then trained to recognize inappropriate content, and process the billions of pieces of content on a site or app. As one AI professor notes: [2]

 

‘All the impressive advances we see with deep learning have come about using what is called ‘supervised learning’ where the data is labelled ‘good’ or ‘bad,’ or ‘Bob’ and ‘Carol’ …we can’t do unsupervised learning as well if the data is unlabeled.’

 

The Future of Content Moderation

Future AI tools will will be able to identify and keep scores on specific attributes within an item, rather than just identifying the content. Based on attributes, a relative ‘risk’ score will be generated, that can determine if something should be posted immediately, posted but still reviewed, reviewed before posting, or not posted at all.

But content moderation will also rely on keeping humans in the loop. Humans, for their part, will continue to play an overseeing role, and play to their strengths. Humans have greater contextual understanding, cultural context and can view content subjectively–  so they’ll be much more involved in the complex ‘gray areas’ of decision-making.

An Accenture report goes further, [3] and actually envisions human moderators turning into ‘investigators’ of sorts. These investigators will be armed with data analytics, and have access to behavioral data on on company platforms. They will be able to start detecting “bad actor”  and even predict bad behavior. The report’s key takeaway:

 

‘When both the AI and investigator work together, investigators can attack the “root cause” of all kinds of problems before they occur, dramatically reducing the need to actually moderate digital content at all’

 

This bold vision of the future sees an erasure of current low-skilled data labelers and moderators, and instead predicts high-skilled investigators working in tandem with AI to more effectively moderate.

 

Conclusion:

At Lionbridge AI, we believe that while AI technology will be increasingly deployed in the field of content moderation, there will always be a need for humans to provide some oversight and training.

That’s where Lionbridge AI comes in. With 500,000+ skilled workers working around the clock, you can effectively scale your content moderation efforts. Our team of content moderators will work according to your specific guidelines and objectives to ensure compliance with company policies and legal standards.

 

Sources:

[1] Julia Carrie and Olivia Wong, ‘Facebook releases content moderation guidelines – rules long kept secret’, The Guardian, 24 April 2018.

[2] Hope Reese, ‘Is ‘data labeling’ the new blue-collar job of the AI era?’, TechRepublic, 10 March 2016.

[3] ‘Content Moderation: The Future is Bionic,’ Accenture, 2017.

The Author
Meiryum Ali

Freelance writer working at Lionbridge; AI enthusiast

Welcome!

Sign up to our newsletter for fresh developments from the world of training data. Lionbridge brings you interviews with industry experts, dataset collections and more.