A study in classification
Submitted by Ramanan Balakrishnan (@ramananbalakrishnan) on Tuesday, 29 May 2018
Let me ask you a question, is a watch a time-keeping device, an electrical gadget, a collectible item or piece of jewelry? (you can pick only one). Such queries, mandated by governments across the world, cause sleepless nights for the global trade industry. The astronomical penalties on making classification errors in such import/export declarations being one key reason for worry.
In this session, we will take this Harmonized System (HS) classification problem as an example and talk about how we can build ML systems which process such complexity, and still perform accurate classification.
The talk will be broken down into individual sections describing the various stages of development. By sticking to a specific use-case, the talk will list the decisions that need to be made and hopefully generalizations can also be derived as a result.
The aim of this talk is to convey the questions and approaches that need to be considered when making ML-driven solutions successful within traditional business workflows.
Introduction [3-4 mins]
An introduction to the ML problem at hand (an import/export related classification task). Examples will be presented to highlight the complexity of tasks involved. This section will also be used to explain the real-world implications of the system that we aim to develop. The use-case introduced in this section, will be continuously referred to throughout the talk.
Starting steps [5 mins]
This section will describe the ideal first steps to start with. Approaches to analyze the dataset will be presented. Expected outcomes will be discussed, together with the need to develop baseline guarantees.
- dataset considerations
- problem solving by pattern matching
- analyzing existing workflows (aka the system you are looking to make redundant)
- calibrating expectations
Advanced considerations [5 mins]
In more complicated scenarios, additional (business-driven) objectives need to be considered before making decisions. This section will talk about how involving other project stakeholders can drastically affect your own internal roadmap towards a successful ML product.
- business context considerations
- other stakeholder involvement
Deployment and continuous learning [5 mins]
Given the knowledge learned in the earlier sections, we can now focus on what makes a ML deployment successful. The advantages to having a “human-in-the-loop” workflow will also be presented here. By introducing additional checkpoints at multiple stages and continuous monitoring, effective quantitative assessments can be carried out.
- deployment scenarios
- human-in-the-loop augmentation
- effective monitoring outcomes
Conclusion [3-4 mins]
This section will serve as a recap of the entire talk. The approach followed through the earlier sections will be summarized and hopefully presented as a generalizable approach for others.
I am a member of the data science team at Semantics3 - building data-powered software for ecommerce-focused companies. Over the years, I have had the chance to dabble in various fields covering data processing, pipeline setup, database management and data science. When not picking locks, or scuba diving, I usually blog about my technical adventures at our team’s engineering blog and sometimes, speak at conferences.