Anthill Inside 2019

On infrastructure for AI and ML: from managing training data to data storage, cloud strategy and costs of developing ML models

Propose a session

Building a Context-Aware Knowledge graph using Graph analysis & Language models

Submitted by Shashank Rao (@shashankpr) on Sunday, 14 April 2019

Abstract

Introduction

At EtherLabs, we are building a video platform that provides insights into the numerous live audio and video meetings that organizations conduct everyday. In such a scenario, in order to acquire critical metrics such as important moments, topics discussed and possible intents from textual data, basic NLP tasks like keyphrase extraction becomes significantly important. Important keyphrases extracted can later be used for other downstream tasks like topic modelling, intent detection, recommendation, search, and building knowledge graphs. We employed a graph-based approach, which is completely unsupervised, to identify important keyphrases from large amounts of textual data. To further make the graph more aware about the context of the discussion (HR, engineering, marketing etc), we use language models, trained and fine-tuned on specific domains, to re-rank the keyphrases based on the domain-knowledge.

Motivation

Keyphrase extraction is a highly researched and well-defined task in the field of NLP. Various approaches ranging from supervised methods (Bag of Words, TF-IDF) to unsupervised (graph-based and clustering) to applying deeplearning algorithms on the mixture of both. Recent advances in deeplearning-based appraoches have yielded high performance for extracting keywords, however, these methods require large amount of training data and time. Many tools like SpaCy and Gensim have also provided black-box methods to achieve the same.

Although many methods and solutions are available for extracting keywords, we chose to work on graph-based approach which is inspired from the famous TextRank (or, the PageRank) algorithm. The key motivations for choosing this approach are:

  • Text data have been proven to have important structural information. Such kind of information can be captured by word graphs, with words forming the nodes and their co-occurrences forming the edges or relations.
  • Graph-based methods work well with noisy text data thereby not enforcing any training constraints.
  • Unsupervised method lets us obtain candidate keywords which can be further filtered by using other methods like syntax rules, language models and ML classifiers.
  • Graph-based extraction enables us to visualize and interpret the identification of keywords. Having a certain level of explainability helps in further fine-tuning the task which would have been tough to do if deeplearning algorithms were used.
  • Graph analysis on the word graphs provides us other insights like community detection which can be used for detecting potential topics.

Outline

  • The concept behind building a word graph and computing keyword ranks using PageRank algorithm
  • Using sentence embeddings from language models to bias the PageRank computation.
  • Using graph analysis methods like Between Centrality and Louvain partition algorithm to detect topics (communities).
  • Extending the word graph to Knowledge graph to get other relations in the data.
  • Exploring Graph databases, Dgraph in particular, to persist graphs.

Speaker bio

Shashank is an AI/ML Engineer at EtherLabs, Bangalore. He has a MS degree in Computer Science (specialization in ML) from Delft University of Technology, Netherlands and has over 4 years of research and technical experience in domains such as recommendation systems, healthcare, speech & multimedia technology, IoT, NLP and HCI.

Links

Comments

  • Zainab Bawa (@zainabbawa) Reviewer 2 months ago

    Thanks for the submission Shashank. Proposals on deep learning, speech recognition, NLP and computer vision will be considered for Anthill Inside because Anthill Inside covers these topics.

    You have to submit detailed slides, explaining:

    1. What is the problem that you were/are trying to solve? This problem (with a detailed context) has to be generalizable across a large segment of participants. Therefore, the problem statement cannot be about a specific issue you faced at EtherLabs.
    2. Why did you choose the described approach to solve your problem? What other approaches did you consider/compare in solving this problem? Why is this approach better than the other approaches you evaluated/considered?
    3. Share details about your solution, including deep dive.
    4. Show before-and-after comparisons, including how did this approach improve the situation (if it did); what trade-offs or compromises did you have to make in the process of implementation?
    5. How is your design/approach better? What is the big win that you have achieved with this innovation? Therefore, what is it that the participants should take away from your work?

    By or before 21 May, you have to submit two things for us to evaluate your proposal and make a decision:

    1. Slides explaining each of the above points.
    2. A two-minute preview video where you make an elevator pitch about your talk – in two minutes tell participant why your proposed talk is interesting and what is it that they will learn from your proposal (which is novel).

    Upload slides and preview video to your proposal.

Login with Twitter or Google to leave a comment