Anthill Inside 2019

A conference on AI and Deep Learning



Sandya Mannarswamy

Sandya Mannarswamy


Tutorial on Testing of Machine Learning Applications

Submitted Sep 3, 2019

##URL for workshop date, time, venue, schedule and tickets:

Rapid progress in Machine Learning (ML) has seen a swift translation to real world commercial deployment. While research and development of ML applications have progressed at an exponential pace, the required software engineering process for ML applications and the corresponding eco-system of testing and quality assurance tools which enable software reliable, trustworthy and safe and easy to deploy, have sadly lagged behind. Specifically, the challenges and gaps in quality assurance (QA) and testing of AI applications have largely remained unaddressed contributing to a poor translation rate of ML applications from research to real world [107]. Unlike traditional software, which has a well-defined software testing methodology, ML applications have largely taken an ad-hoc approach to testing. ML researchers and practitioners either fall back to traditional software testing approaches, which are inadequate for this domain, due to its inherent probabilistic and data dependent nature, or rely largely on non-rigorous self-defined quality assurance methodologies. These issues have driven the ML and Software Engineering research communities to develop of newer tools and techniques designed specifically for ML. These research advances need to be publicized and practiced in real world in ML development and deployment for enabling successful translation of ML from research prototypes to real world. This tutorial intends to address this need.
This tutorial aims to

  1. Provide a comprehensive overview of testing of ML applications
  2. Provide practical insights and share community best practices for testing ML software

Target audience for this tutorial would include the data science and machine learning community folks. This would include

  1. Industry Machine Learning practitioners and solution architects
  2. Software developers/ML Engineers who are developing production machine learning applications
  3. Software quality assurance and testing professionals who have to test ML applications
  4. student ML enthusiasts
  5. ML researchers (industry/academic)

*** A basic degree of familiarity in ML concepts as well as basic/intermediate experience in developing of ML applications is expected from this tutorial audience.*** Audience should be familiar with the general software development life cycle as well as intermediate coding ability in one of the high-level programming language such as Python/R/Java/C/C++/Matlab, which they have used for developing ML applications. This tutorial does not require any prior knowledge in traditional software testing and quality assurance methodologies.

Key takeaways for the audience include:

  1. Overview of testing ML applications - How/Why/What
  2. Tools and Techniques available for testing ML applications
  3. Practical insights/tips for incorporating into their work on testing ML models

We have set up a survey for the tutorial participants so that we can fine tune the contents based on the responses.


This will be a half day tutorial consisting of four parts. The first part of the tutorial will cover the fundamental concepts of ML testing, followed by coverage of state of art techniques and methods in each of the sub-topics:

  1. How to Test – ML Testing Workflow
  2. What Components to Test
  3. What Properties to Test for,
  4. Testing for different application scenarios
    With the audience armed with this background, the second half of the tutorial will cover the stages of Machine Learning Life Cycle from Software Quality Assurance perspective, outlining the key quality assurance requirements for each stage and methods to meet these SQA requirements. We will cover existing open source and commercial tools available for ML Testing along with data sets available for ML Testing. This session will also provide tips and actionable insights for improving software quality in ML Life cycle.
    The third part of the tutorial will focus on the research challenges and open problems in this space, pointing out potential opportunities. This part will highlight the process of taking ML applications from research to real world industry, point out the process and product gaps and challenges to be addressed for successful translation of ML applications to real world deployment.


  1. Part I – 50 minutes followed by Q & A for 10 minutes
  2. Part 2 – 50 minutes followed by Q & A for 10 minutes
  3. Part 3 – 50 minutes followed by Q & A for 10 minutes
  4. Hands on Session – 45 minutes

Part I – Overview of Testing Machine Learning Applications

What is ML Testing

How is it different from traditional software testing

Essentials of Machine Learning Testing

ML Testing Workflow (How to Test)

Test Input Generation

Metamorphic Testing

Test Selection & Prioritization

Test coverage metrics

Components to test

Data Testing

Model Code Testing

ML Framework Testing

Properties to test

Correctness testing

Overfitting detection

Robustness testing

Efficiency testing

Fairness testing

Interpretability testing

Security testing

Example Application Scenarios to Test

Neural Machine Translation

Part II – Machine Learning Life Cycle from Software Quality Assurance Perspective

Stages of Machine Learning Life Cycle

Software testing for different types of machine learning

Testing Supervised Learning models

Testing unsupervised Learning models

Testing reinforcement learning models

Datasets for ML Testing

Commercial and Open source tools for ML Testing

Part III – ML Testing Horizon

Best practices for ML testing

Research Challenges

Open problems

Potential Future Work

Relevant resources

Part IV – Hands on exercises

Testing intent classification for chatbots

Bias detection and fairness testing using opensource tools


Participants should bring their own laptop. We will provide a list of open source libraries to be installed for hands on exercises before the tutorial session once we finalize the contents.

Speaker bio

This tutorial will be organized by three of us:
1.Sandya Mannarswamy, Independent NLP Research Scientist.
2.Shourya Roy, Head, American Express AI Labs,
3.Saravanan Chidambaram, Independent NLP Researcher & Consultant,

Sandya Mannarswamy is an independent NLP researcher. She was previously a senior research scientist at Conduent Labs India in the Natural Language Processing research group. She holds a Ph.D. in computer science from Indian Institute of Science, Bangalore. Her research interests span natural language processing, machine learning and compilers. Her research career spans over 16 years, at various R&D labs, including Hewlett Packard Ltd, IBM Research etc. She has co-organized a number of workshops including workshops at International Conference on Data Management, Machine Learning Debates workshop at ICML-2018 etc. Her current research is focused on software testing and evaluation of Natural Language Processing applications. She has extensive experience in traditional software engineering, working on Research and Development of developer tools eco-system such as compiler, debugger, performance analyzer, static source code analyzer during her extensive career at Hewlett Packard. She along with Shourya, co-authored a paper at IJCAI 2018, which focused on the challenges in taking AI applications from research to real world. Her current research is focussed on rigorous evaluation of NLP applications (using NLP to evaluate NLP). She is the author of the popular CodeSport column in Open Source For You magazine. (

Shourya Roy ( is Head and VP of American Express AI Labs which is spearheading innovations in the areas of machine learning, NLP and document recognition, cloud computing and AI-product management for American Express. Shourya’s research interest spans Text and Web Mining & Natural Language Processing. He holds a Ph.D. in computer science from Indian Institute of Science, Bangalore. Over the years, Shourya’ s work has led to 15 granted patents and about 70 publications in premier journals and conferences from his current and prior association with research labs of IBM and Xerox over 15 years. In recent times, Shourya co-organized a number of workshops in tier-1 conferences ICML 2018, KDD 2018, SIGMOD 2016-18, ECML 2016, ICDE 201617 and notably had co-initiated and ran the series of Noisy Text Analytics (AND) series of workshops between 2007-12. He is currently serving as the Vice Chair of the India Chapter of SIGKDD organization (IKDD).

Saravanan Chidambaram (Saro) ( is an independent consultant in Machine Learning and AI technologies. Previously he was head of Advanced Development Centre, Hewlett Packard Enterprise, where he led the research team exploring emerging technologies, including AI/Blockchain/ML. Over a career spanning 16 years, at various R&D labs, including Hewlett Packard Ltd, Microsoft and Oracle, he has led the development of many research and development projects in the areas of virtualization, compilers, kernel and big data, focusing on designing and deploying mission critical enterprise software. Saro is passionate about educating the emerging ML software developer community into adopting rigorous software quality assurance techniques. He is currently working on developing a test-suite for testing NLP applications.


{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

Anthill Inside is a forum for conversations about risk mitigation and governance in Artificial Intelligence and Deep Learning. AI developers, researchers, startup founders, ethicists, and AI enthusiasts are encouraged to: more