##About the 2019 edition:
The schedule for the 2019 edition is published here: https://hasgeek.com/anthillinside/2019/schedule
The conference has three tracks:
- Talks in the main conference hall track
- Poster sessions featuring novel ideas and projects in the poster session track
- Birds of Feather (BOF) sessions for practitioners who want to use the Anthill Inside forum to discuss:
- Myths and realities of labelling datasets for Deep Learning.
- Practical experience with using Knowledge Graphs for different use cases.
- Interpretability and its application in different contexts; challenges with GDPR and intepreting datasets.
- Pros and cons of using custom and open source tooling for AI/DL/ML.
#Who should attend Anthill Inside:
Anthill Inside is a platform for:
- Data scientists
- AI, DL and ML engineers
- Cloud providers
- Companies which make tooling for AI, ML and Deep Learning
- Companies working with NLP and Computer Vision who want to share their work and learnings with the community
For inquiries about tickets and sponsorships, call Anthill Inside on 7676332020 or write to firstname.lastname@example.org
Sponsorship slots for Anthill Inside 2019 are open. Click here to view the sponsorship deck.
Anthill Inside 2019 sponsors: #
Building a Context-Aware Knowledge graph using Graph analysis & Language models
At EtherLabs, we are building a video platform that provides insights into the numerous live audio and video meetings that organizations conduct everyday. In such a scenario, in order to acquire critical metrics such as important moments, topics discussed and possible intents from textual data, basic NLP tasks like keyphrase extraction becomes significantly important. Important keyphrases extracted can later be used for other downstream tasks like topic modelling, intent detection, recommendation, search, and building knowledge graphs. We employed a graph-based approach, which is completely unsupervised, to identify important keyphrases from large amounts of textual data. To further make the graph more aware about the context of the discussion (HR, engineering, marketing etc), we use language models, trained and fine-tuned on specific domains, to re-rank the keyphrases based on the domain-knowledge.
Keyphrase extraction is a highly researched and well-defined task in the field of NLP. Various approaches ranging from supervised methods (Bag of Words, TF-IDF) to unsupervised (graph-based and clustering) to applying deeplearning algorithms on the mixture of both. Recent advances in deeplearning-based appraoches have yielded high performance for extracting keywords, however, these methods require large amount of training data and time. Many tools like SpaCy and Gensim have also provided black-box methods to achieve the same.
Although many methods and solutions are available for extracting keywords, we chose to work on graph-based approach which is inspired from the famous TextRank (or, the PageRank) algorithm. The key motivations for choosing this approach are:
- Text data have been proven to have important structural information. Such kind of information can be captured by word graphs, with words forming the nodes and their co-occurrences forming the edges or relations.
- Graph-based methods work well with noisy text data thereby not enforcing any training constraints.
- Unsupervised method lets us obtain candidate keywords which can be further filtered by using other methods like syntax rules, language models and ML classifiers.
- Graph-based extraction enables us to visualize and interpret the identification of keywords. Having a certain level of explainability helps in further fine-tuning the task which would have been tough to do if deeplearning algorithms were used.
- Graph analysis on the word graphs provides us other insights like community detection which can be used for detecting potential topics.
- The concept behind building a word graph and computing keyword ranks using PageRank algorithm
- Using sentence embeddings from language models to bias the PageRank computation.
- Using graph analysis methods like Between Centrality and Louvain partition algorithm to detect topics (communities).
- Extending the word graph to Knowledge graph to get other relations in the data.
- Exploring Graph databases, Dgraph in particular, to persist graphs.
Speaker bio #
Shashank is an AI/ML Engineer at EtherLabs, Bangalore. He has a MS degree in Computer Science (specialization in ML) from Delft University of Technology, Netherlands and has over 4 years of research and technical experience in domains such as recommendation systems, healthcare, speech & multimedia technology, IoT, NLP and HCI.