The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Hari Bhaskar S

Nikhil Ketkar

@nikhilketkar

Content Moderation Systems at Scale

Submitted Jun 19, 2024

We heavily rely on the Web for meeting our information needs today. Examples include Wikipedia, Twitter, Instagram, Youtube, Google Maps etc. All of these are platforms where millions of users post billions of pieces of content every day on a wide range of topics. The content is consumed by hundreds of millions of users. While a rich source of information, these platforms are also easy targets for abuse and harm, both intentional as well as un-intentional. Intentional harm includes the use of these platforms for fraud, misinformation, trolling, hate and other forms of vandalism. Unintentional harm includes factually incorrect information, stale information, information bias and more.
The underlying platform providers have a huge responsibility in terms of ensuring that users are provided a safe, delightful, useful and transparent experience for the information that is presented to them. However, this is a very difficult problem to solve in the real world. There are a large number of very hard challenges to deal with.
For example:
How do we deal with highly motivated bad actors who are technically savvy, have financial means and put continuous efforts to identify vulnerabilities and expose them, adapting at a very fast pace?
How do we differentiate between facts and opinions and determine the actual “ground truth label” and do it at scale, with thorough representation for all kinds of patterns and do it fast before the patterns change so that our models have a good exposure to the underlying problem
How do we deal with signal sparsity? What if we simply don’t have any useful signals to train the classifier on?
What do we do if the tolerance for errors and mistakes is very low and yet, we don’t have an effective classifier to solve the problem?
How do we solve this for a large number of semantically different types of documents?
How do we build the system to deal with billions of documents and build an efficient and reliable system?

In this short talk, we will elaborate on this super critical and pressing challenge faced by the tech industry and give you insights into how the above challenges and many more are solved in the

References:
https://www.astesj.com/v02/i01/p03/#1638604295142-6e9377b7-ec34
https://deepblue.lib.umich.edu/bitstream/handle/2027.42/147111/rssa00397.pdf;sequence=1
https://minds.wisconsin.edu/bitstream/handle/1793/60660/TR1648.pdf?sequence=1
https://cseweb.ucsd.edu//~elkan/rescale.pdf
http://proceedings.mlr.press/v97/byrd19a/byrd19a.pdf
http://ai.stanford.edu/people/ronnyk/roc.pdf
https://conservancy.umn.edu/bitstream/handle/11299/215731/07-017.pdf?sequence=1

Outline

{Replace this with a list of points to be covered.}

Impact

{Replace this with an explanation of the impact of your work within your organization.}

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Together, we can build for everyone.

Workshop sponsor

Datastax, the real-time AI Company.

Lanyard Sponsor

We reimagine the way the world moves for the better.

Sponsor

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.