The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

The ninth edition of The Fifth Elephant will be held in Bangalore on 16 and 17 July 2020.

The Fifth Elephant brings together over one thousand data scientists, ML engineers, data engineers and analysts to discuss:

  1. Data governance
  2. Data privacy and engineering for privacy including engineering for Personal Data Protection (PDP) bill.
  3. Data cleaning, annotation, instrumentation and productionizing data science.
  4. Identifying and handling fraud + data security at scale
  5. Feature engineering and ML platforms.
  6. What it takes to create data-driven cultures in organizations of different scales.

**Event details:

Dates: 16-17 July 2020
Venue: NIMHANS Convention Centre, Dairy Circle, Bangalore

Why you should attend:

  1. Network with peers and practitioners from the data ecosystem.
  2. Share approaches to solving expensive problems such as cleanliness of training data, annotation, model management and versioning data.
  3. Demo your ideas in the demo sessions.
  4. Join Birds of Feather (BOF) sessions to have productive discussions on focussed topics. Or, start your own Birds of Feather (BOF) session.

Contact details:
For more information about The Fifth Elephant, call +91-7676332020 or email sales@hasgeek.com


Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Santosh

Case Study - Information Retrieval from millions of legal documents using Deep Learning models

Submitted May 27, 2020

Information Retrieval (Named Entity Recognition) is one of the most widely used applications in NLP. Though most of us understand the building blocks of named entity recognition frameworks, we are usually blind to the challenges faced while dealing with real-time problem statements, especially the ones that deal with scale. Over the last one year, the Data-Science team at CoffeeBeans was fortunate enough to get a chance to work at great depths in developing deep learning solutions to tackle such interesting problem statements.

At a high level, the problem definition was to develop end to end deep learning solutions to extract over 100 fields from each of the millions of legal documents which mostly deal with the real estate transactions. The length of each document varies from 10-20 pages. The design of the framework involves an ensemble of classification, extraction and relationship mapping models to extract entities ranging from document level information to house addresses and names of the parties involved in the real estate transaction. Our persistent efforts have resulted in significant cost savings to our clients in terms of reduced dependency on manual efforts.

In this talk, we would like to share our experiences and learnings from the above described work with the data science community. The presentation starts with an introduction to the nature of the problem statement along with the structure and scale of the data. We then discuss some interesting challenges faced during the data-preprocessing, model training and deployment stages, while also showcasing the solutions that were designed to tackle these challenges. Finally, we shall try to shed light on some interesting observations made during the entire model building process.

Outline

  • Introduction to Information Retrieval
  • Nature and structure of the data
  • Solution Design
  • Model Architecture : A variation of Bi-Directional LSTMs
  • Challenges and solutions
  • Frameworks, Tools and Tech-Stack

Requirements

Basic knowledge of NLP and Deep Learning

Speaker bio

Santosh graduated from IIT Madras with a Dual Degree in Civil and Transportation Engineering. He has over 5 years of experience in data science with an expertise in Deep Learning. He picked up interest in Machine Learning and Computer Vision while he was a part of the research group at Intelligent Transportation Systems Laboratory, IIT Madras. After graduation, Santosh worked with FreeCharge where he helped build their real-time Fraud Detection Systems which was based on advanced ML algorithms. Santosh is currently engaged as a Lead Data Scientist at CoffeeBeans working with Fortune 500 clients prototyping and deploying end to end Deep Learning solutions.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more