The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

Discover, Classify and Protect Enterprise Data with Deep Learning and NLP

Submitted by Sanghamitra Bhattacharjee (@sanghamitrab) on May 31, 2020

Status: Submitted


Intelligent and automatic way of classifying enterprise documents and data is still in its nascent stages. Although some solutions exist for Data Loss Prevention (DLP), this is not a very evolved field and incable of handling complex use cases. In our paper we talk about using Deep learning and NLP to automatically classify information. In course of our work, we realised this is a neglected area and hence there is a dearth of any wholesome solution. Although a lot of work has gone into NLP, not much work has happened in the area of using ML and NLP to understand and classify enterprise data automatically. We present a glimpse into the work that has been done so far and the different algorithms and techniques we have used to develop our classification framework. This framework has been built using open source libraries and tools, hence enabling enterprises to customise the framework as per their requirements.


The session is targeted for individuals working in Data Science/Machine Learning. This session will also be of interest to individuals in other areas of technology, such as Cyber Security, Information Security, Computer Security etc. to gain insights on how Deep learning in conjunction with NLP can be used to build self learning and fault tolerant systems. Foundational knowledge in data science and AI/ML is desirable.

Session Format and Duration: This will be a presentation for a duration of 45 minutes , followed by a 10-minute Q&A from the audience. We will also demo some of the use cases during the course of the presentation.


No special requirements other than suitable set up to connect the laptops for the demo.

Speaker bio

Sanghamitra Bhattacharjee is Director, Data and Analytics at RBS with over 18 years of experience in technology domain. She is an alumni of NIT, Trichy.
She is passionate about applying data and AI/ML to solve complex business and non-business problems. Her experience in leading engineering initiatives spans across Cloud Engineering, Chat bots and AI/ML on cloud. Prior to RBS, she was part of the leadership team of a B2C start up , Tapzo, eventually acquired by Amazon. She has built several key products for mobile analytics, deep personalisation, real time user targeting and natural language processing.
She extremely passionate about Diversity and Inclusion and is workstream lead for D&I at RBS, India. She is as committee member of Grace Hopper Celebrations, India for last few years.
She has been a speaker at several international and national conferences including MicroStrategy World, NASSCOM and Agile India.
She also holds patent at Yahoo! for her work on delivering contextual ads for search engines.

Bhawna Anand is Associate, Data and Analytics at RBS with over 11 years of experience in technology domain. She is AMPBA(Data Science) Alumni from Indian School of Business, Hyderabad and also banking domain certified in Capital & Financial markets from IIM, Bangalore. She has experience in Fraud Analytics, Risk Analytics & Customer Analytics , NLP and Deep learning Neural Network. She is Data Science Technology Ambassador at RBS, India. She has conducted various Analytics Trainings for RBS Leadership India team,
Python Training for Data Science for RBS employees across Pan India ,Represented RBS in Campus hiring Events for Data Science.
She has deep experience in Analytics- Big data, Spark, Python, R, Machine learning-Supervised/Unsupervised/Deep learning, AWS Cloud, Visualization, AWS machine Learning, end to end Production implementation of ML models and ML ops.




Login to leave a comment