The Fifth Elephant 2018

The Fifth Elephant 2018

The seventh edition of India's best data conference

Ramanan Balakrishnan

@ramananbalakrishnan

A study in classification

Submitted May 29, 2018

Let me ask you a question, is a watch a time-keeping device, an electrical gadget, a collectible item or piece of jewelry? (you can pick only one). Such queries, mandated by governments across the world, cause sleepless nights for the global trade industry. The astronomical penalties on making classification errors in such import/export declarations being one key reason for worry.

In this session, we will take this Harmonized System (HS) classification problem as an example and talk about how we can build ML systems which process such complexity, and still perform accurate classification.

The talk will be broken down into individual sections describing the various stages of development. By sticking to a specific use-case, the talk will list the decisions that need to be made and hopefully generalizations can also be derived as a result.

The aim of this talk is to convey the questions and approaches that need to be considered when making ML-driven solutions successful within traditional business workflows.

Outline

Introduction [3-4 mins]

An introduction to the ML problem at hand (an import/export related classification task). Examples will be presented to highlight the complexity of tasks involved. This section will also be used to explain the real-world implications of the system that we aim to develop. The use-case introduced in this section, will be continuously referred to throughout the talk.


Starting steps [5 mins]

This section will describe the ideal first steps to start with. Approaches to analyze the dataset will be presented. Expected outcomes will be discussed, together with the need to develop baseline guarantees.

Topics Covered

  • dataset considerations
  • problem solving by pattern matching
  • analyzing existing workflows (aka the system you are looking to make redundant)
  • calibrating expectations

Advanced considerations [5 mins]

In more complicated scenarios, additional (business-driven) objectives need to be considered before making decisions. This section will talk about how involving other project stakeholders can drastically affect your own internal roadmap towards a successful ML product.

Topics covered

  • business context considerations
  • other stakeholder involvement

Deployment and continuous learning [5 mins]

Given the knowledge learned in the earlier sections, we can now focus on what makes a ML deployment successful. The advantages to having a “human-in-the-loop” workflow will also be presented here. By introducing additional checkpoints at multiple stages and continuous monitoring, effective quantitative assessments can be carried out.

Topics covered

  • deployment scenarios
  • human-in-the-loop augmentation
  • effective monitoring outcomes

Conclusion [3-4 mins]

This section will serve as a recap of the entire talk. The approach followed through the earlier sections will be summarized and hopefully presented as a generalizable approach for others.

Speaker bio

I am a member of the data science team at Semantics3 - building data-powered software for ecommerce-focused companies. Over the years, I have had the chance to dabble in various fields covering data processing, pipeline setup, database management and data science. When not picking locks, or scuba diving, I usually blog about my technical adventures at our team’s engineering blog and sometimes, speak at conferences.

Slides

https://goo.gl/781M2n

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures