Extracting and Employing Domain-Specific Knowledge Graphs (DKGraphs)

Jul 2014

21 Mon

22 Tue

23 Wed 09:30 AM – 05:00 PM IST

24 Thu 09:45 AM – 05:00 PM IST

25 Fri 08:30 AM – 07:15 PM IST

26 Sat 08:30 AM – 07:15 PM IST

27 Sun

NIMHANS Convention Centre, Bangalore

Extracting and Employing Domain-Specific Knowledge Graphs (DKGraphs)

Submitted May 13, 2014

Section: Full talk Technical level: Beginner

Assume that you got an opportunity to work with vast amount of unstructured and semi-structured text data in a specific domain e.g. automobiles, agriculture, medical, internet, etc. Your task is to derive business value out of this textual data by extracting a domain-specific knowledge graph (DKGraph) and employing it for various business use cases. In this problem, there are several key challenges:

Since developing DKGraphs is not a common task hence there are limited open source/commercial tools available to develop it. One needs to use a combination of NLP, IR, ML techniques to develop the DKGraphs. What are various NLP, ML in order to build and employ the DKGraphs?
How to make a balance between automation and audits from domain experts?
How to employ the DKGraphs to derive business value?

In this talk, I will make an attempt to answer above questions. I will share my experience in building the DKGraphs from scratch in two industries (Automobiles and Smartphones).

Outline

Almost, in every industry, data is being used to make key business decisions and nearly 70-80% of the data is in either semi-structured or unstructured text. One of the key challenges is to make sense out of this vast text data. DKGraphs is one of the popular systems to capture the domain knowledge in structured form and enable several use cases e.g. capturing semantic variations of domain-specific entities, using domain-specific rules and the DKGraphs to find errors/inaccuracies in documentations and enable system-level diagnosis and root cause analysis.

In this talk, I shall discuss the methodology, results, challenges, and business impact of two DKGraphs. Here is an outline of my talk:

Brief Introduction to common text analytics tasks and common NLP modules
Introduction to the DKGraphs
In-depth discussion on NLP pipeline to develop DKGraphs (using automobile and smartphone domains)
Domain-specific Linguistic Preprocessing (Sentence boundary detection, Tokenizer, Morphological Analyzer, POS Tagger, Chunker), Domain-specific Entity Extraction and Entity Disambiguation, Relationship Extraction module
DKGraph Creation and Visualization
Machine learning techniques (Bayesian Networks, Factorial Hidden Markov Models) to employ DKGraphs and enable various use cases e.g. root cause analysis
Business Use cases of the DKGraphs in Automobiles and Smartphone industries

Requirements

Preliminary knowledge of text mining, data mining would be helpful in understanding the deeper concepts discussed in my talk.

Speaker bio

I am a Data Geek/Data Scientist who has both academic knowledge (PhD degree) and 10+ years of work experience in building data products from scratch. I have worked in various industries (refineries, automobiles, smartphones, etc.).Apart from data crunching, I love meeting people, outdoor sports, running and biking.

My detailed profile is available at LinkedIn.
http://in.linkedin.com/pub/satnam-singh-phd/2/349/347

The Fifth Elephant 2014