The Fifth Elephant 2024 Annual Conference (12th &13th July)
Maximising the Potential of Data — Discussions around data science, machine learning & AI
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Maximising the Potential of Data — Discussions around data science, machine learning & AI
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
We heavily rely on the Web for meeting our information needs today. Examples include Wikipedia, Twitter, Instagram, Youtube, Google Maps etc. All of these are platforms where millions of users post billions of pieces of content every day on a wide range of topics. The content is consumed by hundreds of millions of users. While a rich source of information, these platforms are also easy targets for abuse and harm, both intentional as well as un-intentional. Intentional harm includes the use of these platforms for fraud, misinformation, trolling, hate and other forms of vandalism. Unintentional harm includes factually incorrect information, stale information, information bias and more.
The underlying platform providers have a huge responsibility in terms of ensuring that users are provided a safe, delightful, useful and transparent experience for the information that is presented to them. However, this is a very difficult problem to solve in the real world. There are a large number of very hard challenges to deal with.
For example:
How do we deal with highly motivated bad actors who are technically savvy, have financial means and put continuous efforts to identify vulnerabilities and expose them, adapting at a very fast pace?
How do we differentiate between facts and opinions and determine the actual “ground truth label” and do it at scale, with thorough representation for all kinds of patterns and do it fast before the patterns change so that our models have a good exposure to the underlying problem
How do we deal with signal sparsity? What if we simply don’t have any useful signals to train the classifier on?
What do we do if the tolerance for errors and mistakes is very low and yet, we don’t have an effective classifier to solve the problem?
How do we solve this for a large number of semantically different types of documents?
How do we build the system to deal with billions of documents and build an efficient and reliable system?
In this short talk, we will elaborate on this super critical and pressing challenge faced by the tech industry and give you insights into how the above challenges and many more are solved in the
References:
https://www.astesj.com/v02/i01/p03/#1638604295142-6e9377b7-ec34
https://deepblue.lib.umich.edu/bitstream/handle/2027.42/147111/rssa00397.pdf;sequence=1
https://minds.wisconsin.edu/bitstream/handle/1793/60660/TR1648.pdf?sequence=1
https://cseweb.ucsd.edu//~elkan/rescale.pdf
http://proceedings.mlr.press/v97/byrd19a/byrd19a.pdf
http://ai.stanford.edu/people/ronnyk/roc.pdf
https://conservancy.umn.edu/bitstream/handle/11299/215731/07-017.pdf?sequence=1
{Replace this with a list of points to be covered.}
{Replace this with an explanation of the impact of your work within your organization.}
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Hosted by
Supported by
Gold Sponsor
Sponsor
Community Partner
Beverage Partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}