Opening the NLP Blackbox - Analysis, Evaluation and Testing of NLP Models

This project is accepting submissions for MLOps November conference edition.

The first edition of the MLOps conference was held on 23, 24 and 27 July. Details about the conference including videos and blog posts are published at https://hasgeek.com/fifthelephant/mlops-conference/

Contact information: For inquiries, contact The Fifth Elephant on fifthelephant.editorial@hasgeek.com or call 7676332020.

Hosted by

The Fifth Elephant

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

All submissions

Previous Next

Opening the NLP Blackbox - Analysis, Evaluation and Testing of NLP Models

Submitted Jun 13, 2021

Rapid progress in NLP Research has seen a swift translation to real world commercial deployment. While a number of success stories of NLP applications have emerged, failures of translating scientific progress in NLP to real-world software have also been considerable. Evaluation of NLP models is often limited to held out test set accuracy on a handful of datasets, and analysis of NLP models is often limited to ablation studies. Lack of rigorous evaluation leads to over-estimation of generalization performance of the built model. A lack of understanding of the inner workings of the model results in ‘Clever Hans’ models which fail in real world deployments. One of the reasons why many NLP models don’t generalize and fail in real world is the lack of detailed evaluation of the model over a comprehensive set of inputs and understanding of biases encoded and weaknesses using model analysis methods.
Of late there has been considerable research interest into analysis methods for NLP models, and evaluation techniques going beyond test set performance metrics. However, this area of work is still not widely disseminated through the NLP community. This talk aims to address this gap, by providing a detailed overview of NLP model analysis and evaluation methods, discuss their strengths and weaknesses and also point towards future research directions in this area.

This talk is intended to provide an in-depth overview of the analysis and evaluation methods for NLP models, covering existing techniques, challenges and research opportunities in this space.
We motivate why rigorous evaluation of NLP models beyond simple metrics such as F1 score/accuracy are needed for real world deployment with specific use-cases/examples. We then talk about the “Clever Hans moment for NLP” , wherein models learn dataset specific features and solve the dataset instead learning to solve the actual task on hand. This sets the context for the need to have robust methods of model analysis and evaluation.
Next, in the context of NLP model analysis and evaluation, we focus on four important questions:

What is the model internal structure in terms of the knowledge it has captured?
What is its behaviour with respect to different inputs?
How do we visualize the model inner workings?
How do we quantify model strengths and weaknesses?
For each of these questions, we discuss the existing methods available, point out their comparative advantages and disadvantages, as well as briefly outlining possible future research directions.

All submissions

Previous Next

Comments

Hosted by

The Fifth Elephant

Submissions for MLOps November edition

Opening the NLP Blackbox - Analysis, Evaluation and Testing of NLP Models

Comments