##About the 2019 edition:
The schedule for the 2019 edition is published here: https://hasgeek.com/anthillinside/2019/schedule
The conference has three tracks:
- Talks in the main conference hall track
- Poster sessions featuring novel ideas and projects in the poster session track
- Birds of Feather (BOF) sessions for practitioners who want to use the Anthill Inside forum to discuss:
- Myths and realities of labelling datasets for Deep Learning.
- Practical experience with using Knowledge Graphs for different use cases.
- Interpretability and its application in different contexts; challenges with GDPR and intepreting datasets.
- Pros and cons of using custom and open source tooling for AI/DL/ML.
#Who should attend Anthill Inside:
Anthill Inside is a platform for:
- Data scientists
- AI, DL and ML engineers
- Cloud providers
- Companies which make tooling for AI, ML and Deep Learning
- Companies working with NLP and Computer Vision who want to share their work and learnings with the community
For inquiries about tickets and sponsorships, call Anthill Inside on 7676332020 or write to email@example.com
Sponsorship slots for Anthill Inside 2019 are open. Click here to view the sponsorship deck.
Rigorous Evaluation of NLP Models for Real World Deployment
Rapid progress in NLP Research has seen a swift translation to real world commercial deployment. While a number of success stories of NLP applications have emerged, failures of translating scientific progress in NLP to real-world software have also been considerable (some of these issues are covered in my IJCAI paper https://www.ijcai.org/proceedings/2018/717). Specifically, the challenges and gaps in the areas of testing and rigorous evaluation of NLP applications have largely remained unaddressed. Of late, there has been considerable debate and research into understanding what NLP models have learnt really when they are trained for a specific task. Instead of just reporting a few metrics such as accuracy/F1-score on a handful of datasets, deeper understanding of NLP models in terms of their robustness covering the input space and generalization capabilities is essential. One of the reasons why many NLP models don’t generalize and fail in real world is the lack of detailed evaluation of the model over a comprehensive set of inputs (both adversarial and non-adversarial) and understanding biases encoded and weaknesses. This talk will cover the need for rigorous evaluation of NLP models, current research and industry best practices on the same and provide practical tips to evaluate the generalizability and robustness of your model for production readiness.
This talk is aimed at NLP engineers and researchers ooking for deeper understanding of NLP model evaluation and robustness for real world inputs. (audience should have at least a minimum of 1-2 years of experience in ML/NLP. Desirable: knowledge of basic concepts such as robustness, adversarial testing, generalization)
Key takeaways would be (a) current gaps in evaluating NLP models (b) research overview of rigorous evaluation of NLP models (c) how can these research findings be applied practically for evaluation and improving NLP model robustness.
We motivate why rigorous evaluation of NLP models beyond simple metrics such as F1-score/accuracy are needed for real world deployment with a few historical use-cases/examples. We then talk about the “CleverHans Moment for NLP” (https://www.linkedin.com/posts/sandya_nlps-clever-hans-moment-has-arrived-activity-6573894455768768512-MDVW). We discuss the latest research around model evaluation for NLP. We then take up the example of a sentiment analysis task as a case-study and discuss the methodology for rigorous evaluation. We conclude by pointing out future work directions in this topic.
Participants should be have intermediate knowledge of NLP model building and tuning. Knowledge of concepts such as robustness, adversarial evaluation and generalization would be desirable but not essential.
Sandya Mannarswamy (https://www.linkedin.com/in/sandya/) is an independent NLP researcher. She was previously a senior research scientist at Conduent Labs India in the Natural Language Processing research group. She holds a Ph.D. in computer science from Indian Institute of Science, Bangalore. Her research interests span natural language processing, machine learning and compilers. Her research career spans over 19 years, at various R&D labs, including Hewlett Packard Ltd, IBM Research etc. She has co-organized a number of workshops including workshops at International Conference on Data Management, Machine Learning Debates workshop at ICML-2018 etc. Her current research is focused on software testing and evaluation of Natural Language Processing applications. She has a number of international research publications and patents in the area of natural language processing (https://scholar.google.co.in/citations?hl=en&user=i27nd3oAAAAJ&view_op=list_works&sortby=pubdate) She co-authored a paper at International Conference on Artificial Intelligence (IJCAI) 2018, which focused on the challenges in taking AI applications from research to real world. Her current research is focussed on rigorous evaluation of NLP applications (“using NLP to evaluate NLP”). She is the author of the popular “CodeSport” column in Open Source For You magazine. (https://opensourceforu.com/tag/codesport/).