Anthill Inside 2019

A conference on AI and Deep Learning

Make a submission

Accepting submissions till 01 Nov 2019, 04:20 PM

Taj M G Road, Bangalore, Bangalore

About the 2019 edition:

The schedule for the 2019 edition is published here: https://hasgeek.com/anthillinside/2019/schedule

The conference has three tracks:

  1. Talks in the main conference hall track
  2. Poster sessions featuring novel ideas and projects in the poster session track
  3. Birds of Feather (BOF) sessions for practitioners who want to use the Anthill Inside forum to discuss:
    - Myths and realities of labelling datasets for Deep Learning.
    - Practical experience with using Knowledge Graphs for different use cases.
    - Interpretability and its application in different contexts; challenges with GDPR and intepreting datasets.
    - Pros and cons of using custom and open source tooling for AI/DL/ML.

Who should attend Anthill Inside:

Anthill Inside is a platform for:

  1. Data scientists
  2. AI, DL and ML engineers
  3. Cloud providers
  4. Companies which make tooling for AI, ML and Deep Learning
  5. Companies working with NLP and Computer Vision who want to share their work and learnings with the community

For inquiries about tickets and sponsorships, call Anthill Inside on 7676332020 or write to sales@hasgeek.com


Sponsors:

Sponsorship slots for Anthill Inside 2019 are open. Click here to view the sponsorship deck.


Anthill Inside 2019 sponsors:


Bronze Sponsor

iMerit Impetus

Community Sponsor

GO-JEK iPropal
LightSpeed Semantics3
Google Tact.AI
Amex

Hosted by

Anthill Inside is a forum for conversations about Artificial Intelligence and Deep Learning, including: Tools Techniques Approaches for integrating AI and Deep Learning in products and businesses. Engineering for AI. more

Debanjana Banerjee

@debanjana

iCASSTLE: Imbalanced Classification Algorithm for Semi Supervised Text Learning

Submitted Jun 15, 2019

Information in the form of text can be found in abundance in the web today, which can be mined to solve multifarious problems. Customer reviews, for instance, flow in across multiple sources in thousands per day which can be leveraged to obtain several insights. Our goal is to extract cases of a rare event e.g., recall of products, allegations of ethics or, legal concerns or, threats to product-safety, etc. from this enormous amount of data. Manual identification of such cases to be reported is extremely labour-intensive as well as time-sensitive, but failure to do so can have fatal impact on the industry’s overall health and dependability; missing out on even a single case may lead to huge penalties in terms of customer experience, product liability and industry reputation. In this paper, we will discuss text classification through Positive and Unlabeled data i.e., PU classification, where the only class, for which training instances are available, is a rare event. In iCASSTLE, we propose a two-staged approach where Stage I leverages three unique components of text mining to procure representative training data containing instances of both classes in the right proportion, and Stage II uses results from Stage I to run a semi-supervised classification. We applied this to multiple datasets differing in nature of Product Safety as well as nature of imbalance and iCASSTLE is proven to perform better than the state-of-the-art methods for the relevant use-cases.

Keywords: Text Mining, PU Classification, Semi Supervised Text Classification, Sentiment Analysis, Latent Semantic Analysis, Word Frequency, Sparsity Treatment, GloVe, Class Imbalance, Recall Maximization, Data Prioritization

Outline

Introduction
The session will kick off with the concept of Rare Events and how it differs from quintessential Anomalies.
Discuss the concept of One Class Classification in the PU Set Up, and its challenges in presence of imbalance.
Basics of Text Classification : Word To Vec, Sparsity Treatment, LSA, GloVe. This is where concepts of Matrix Factorization might come in handy.
Discuss basic Semi Supervised Classification : the What and the Why. This is where concept of Entropy comes in handy.

Methodology
Discuss detailed metholdology of iCASSTLE in an illustrative problem premise.

Experiment
Illustrate comparative performance on simulated as well as real datasets.
Illustrate metrics to gauge continued performance.

Impact and Next Steps
Discuss benefits and generalization followed by areas of improvements.

Requirements

It is advised that the audience be well versed in the basics of Text Classification. If not, we will try to cover it. The audience should be comfortable with concepts of linear algebra, matrix factorization and regression.

Speaker bio

Speaker : Mainak Mitra, Senior Data Scientist, Walmart Labs|Enterprise Data Science

Links

Slides

https://www.slideshare.net/secret/4VYKaH8bXfcpwn

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 01 Nov 2019, 04:20 PM

Taj M G Road, Bangalore, Bangalore

Hosted by

Anthill Inside is a forum for conversations about Artificial Intelligence and Deep Learning, including: Tools Techniques Approaches for integrating AI and Deep Learning in products and businesses. Engineering for AI. more