Extracting and Employing Domain-Specific Knowledge Graphs (DKGraphs)
Submitted by Satnam Singh, PhD (@satnam-datageek) on Tuesday, 13 May 2014
Assume that you got an opportunity to work with vast amount of unstructured and semi-structured text data in a specific domain e.g. automobiles, agriculture, medical, internet, etc. Your task is to derive business value out of this textual data by extracting a domain-specific knowledge graph (DKGraph) and employing it for various business use cases. In this problem, there are several key challenges:
1. Since developing DKGraphs is not a common task hence there are limited open source/commercial tools available to develop it. One needs to use a combination of NLP, IR, ML techniques to develop the DKGraphs. What are various NLP, ML in order to build and employ the DKGraphs?
2. How to make a balance between automation and audits from domain experts?
3. How to employ the DKGraphs to derive business value?
In this talk, I will make an attempt to answer above questions. I will share my experience in building the DKGraphs from scratch in two industries (Automobiles and Smartphones).
Almost, in every industry, data is being used to make key business decisions and nearly 70-80% of the data is in either semi-structured or unstructured text. One of the key challenges is to make sense out of this vast text data. DKGraphs is one of the popular systems to capture the domain knowledge in structured form and enable several use cases e.g. capturing semantic variations of domain-specific entities, using domain-specific rules and the DKGraphs to find errors/inaccuracies in documentations and enable system-level diagnosis and root cause analysis.
In this talk, I shall discuss the methodology, results, challenges, and business impact of two DKGraphs. Here is an outline of my talk:
Brief Introduction to common text analytics tasks and common NLP modules
Introduction to the DKGraphs
In-depth discussion on NLP pipeline to develop DKGraphs (using automobile and smartphone domains)
Domain-specific Linguistic Preprocessing (Sentence boundary detection, Tokenizer, Morphological Analyzer, POS Tagger, Chunker), Domain-specific Entity Extraction and Entity Disambiguation, Relationship Extraction module
DKGraph Creation and Visualization
Machine learning techniques (Bayesian Networks, Factorial Hidden Markov Models) to employ DKGraphs and enable various use cases e.g. root cause analysis
Business Use cases of the DKGraphs in Automobiles and Smartphone industries
Preliminary knowledge of text mining, data mining would be helpful in understanding the deeper concepts discussed in my talk.
I am a Data Geek/Data Scientist who has both academic knowledge (PhD degree) and 10+ years of work experience in building data products from scratch. I have worked in various industries (refineries, automobiles, smartphones, etc.).Apart from data crunching, I love meeting people, outdoor sports, running and biking.
My detailed profile is available at LinkedIn.