Nov 2019
18 Mon
19 Tue
20 Wed
21 Thu
22 Fri
23 Sat 08:30 AM – 05:30 PM IST
24 Sun
Shashank Rao
At EtherLabs, we are building a video platform that provides insights into the numerous live audio and video meetings that organizations conduct everyday. In such a scenario, in order to acquire critical metrics such as important moments, topics discussed and possible intents from textual data, basic NLP tasks like keyphrase extraction becomes significantly important. Important keyphrases extracted can later be used for other downstream tasks like topic modelling, intent detection, recommendation, search, and building knowledge graphs. We employed a graph-based approach, which is completely unsupervised, to identify important keyphrases from large amounts of textual data. To further make the graph more aware about the context of the discussion (HR, engineering, marketing etc), we use language models, trained and fine-tuned on specific domains, to re-rank the keyphrases based on the domain-knowledge.
Keyphrase extraction is a highly researched and well-defined task in the field of NLP. Various approaches ranging from supervised methods (Bag of Words, TF-IDF) to unsupervised (graph-based and clustering) to applying deeplearning algorithms on the mixture of both. Recent advances in deeplearning-based appraoches have yielded high performance for extracting keywords, however, these methods require large amount of training data and time. Many tools like SpaCy and Gensim have also provided black-box methods to achieve the same.
Although many methods and solutions are available for extracting keywords, we chose to work on graph-based approach which is inspired from the famous TextRank (or, the PageRank) algorithm. The key motivations for choosing this approach are:
Shashank is an AI/ML Engineer at EtherLabs, Bangalore. He has a MS degree in Computer Science (specialization in ML) from Delft University of Technology, Netherlands and has over 4 years of research and technical experience in domains such as recommendation systems, healthcare, speech & multimedia technology, IoT, NLP and HCI.
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}