Email Data Analytics

Nov 2019

18 Mon

19 Tue

20 Wed

21 Thu

22 Fri

23 Sat 08:30 AM – 05:30 PM IST

24 Sun

Make a submission

Accepting submissions till 01 Nov 2019, 04:20 PM

Taj M G Road, Bangalore, Bangalore

Tickets

##About the 2019 edition:

The schedule for the 2019 edition is published here: https://hasgeek.com/anthillinside/2019/schedule

The conference has three tracks:

Talks in the main conference hall track
Poster sessions featuring novel ideas and projects in the poster session track
Birds of Feather (BOF) sessions for practitioners who want to use the Anthill Inside forum to discuss:

Myths and realities of labelling datasets for Deep Learning.
Practical experience with using Knowledge Graphs for different use cases.
Interpretability and its application in different contexts; challenges with GDPR and intepreting datasets.
Pros and cons of using custom and open source tooling for AI/DL/ML.

#Who should attend Anthill Inside:

Anthill Inside is a platform for:

Data scientists
AI, DL and ML engineers
Cloud providers
Companies which make tooling for AI, ML and Deep Learning
Companies working with NLP and Computer Vision who want to share their work and learnings with the community

For inquiries about tickets and sponsorships, call Anthill Inside on 7676332020 or write to sales@hasgeek.com

#Sponsors:

Sponsorship slots for Anthill Inside 2019 are open. Click here to view the sponsorship deck.

Anthill Inside 2019 sponsors:

#Bronze Sponsor

#Community Sponsor

Hosted by

Anthill Inside

Anthill Inside is a forum for conversations about risk mitigation and governance in Artificial Intelligence and Deep Learning. AI developers, researchers, startup founders, ethicists, and AI enthusiasts are encouraged to: more

All submissions

Previous Next

Email Data Analytics

Submitted Apr 15, 2019

Session type: Lecture

Sentiment Analysis from text is a well known problem in machine learning where a given text document can be either positive, negative or neutral. In the last few years, Sentiment Analysis has become a hot-trend topic of scientific and market research in the field of Natural Language Processing (NLP) and Machine Learning.

Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches. Knowledge-based techniques classify text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored. Statistical methods leverage elements from machine learning such as latent semantic analysis, bag of wordd, Word2Vec, Doc2Vec, Pointwise Mutual Information for Semantic Orientation, and deep learning. To mine the opinion in context and get the feature about which the speaker has opined, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text. Hybrid approaches leverage both machine learning and elements from knowledge representation such as ontologies and semantic networks in order to detect semantics that are expressed in a subtle manner.

Sentiment analasis on emails is a bit complex task and only few data email data sets are available publically: https://www.kaggle.com/wcukierski/enron-email-dataset

We show one industrial use case of sentiment analysis on emails data which we are implementing in Freshworks and it will be used by Freshworks clients. We are analysing email conversations between sales agents and their customers to predict the underlying deal outcomes. We are also using this data to identify high potential customers/targets by extracting sentiments from emails. The key challenge in this approach is that corporate emails are generally written in politically correct (and therefore logically complex) manners. Extraction of correct sentiment from these emails is therefore not easy. We use contextualised word embedding to capture different contexts. We further employ a method to upscale important words and downscale less important words present in the corpus. Eventually, we train different classifiers to extract the sentiment from these emails. We compare these classifiers on the basis of well-established accuracy measures such as precision, recall, accuracy etc.

Outline

In the conference, we will talk about:

Overview of the Problem Statements and challenges
Value Proposition from this problem
Overview of contextualised word embedding (BERT, ELMo etc). We will also learn about how they can be applied to our data.
We also talk about a method which upscales important words and downscales less important words in the corpus
We show an implementation of our framework
we will finally talk about hyper-parameter tuning and accuracy measures, followed by a comparison of different algorithms utilized in our work.

Requirements

Speaker bio

Rahul Sharma is a Data Scientist with 7 years of industry experience in applying data science and advanced machine learning techniques to diverse sectors including telecom, healthcare and equity research for quantitative hedge funds. Rahul has completed more than 20 large-scale machine learning projects using structured and unstructured data sets, including data sets with 100s of TB size. Rahul holds a masters’ degree in computer science from Indian Insititute of Technology (IIT) Kharagpur, India.

PLease note that this session will have 2 speakers - Rahul and Swaminathan.