Building a Large scale Augmented classifier ensemble to predict in noisy data
Submitted by Arthi Venkataraman (@arthi) on Wednesday, 15 June 2016
Different types of classifiers were investigated in the context of classification of problem tickets in the Enterprise domain. There were still challenges in building an accurate classifier post data cleaning and other accuracy improving pre-processing techniques. Creating an ensemble of classifiers gave better accuracy than individual classifiers. The maximum accuracy was got by enhancing the ensemble with an additional automatically generated domain specific class wise patterns. Use of this system gave us greater than 4 percent improvement over the techniques of just using the ensemble classifier.
Takeaways - Challenges in Real life classification, Data curation techniques, Ensemble classification, Increasing the accuracy of ensemble through proposed techniques, Building a scalable prediction system
Employees face a lot of issues across different functions in the organization like Payroll, Infrastructure, Facilities, Applications, etc. When employees raise a ticket they are traditionally asked to manually select a category for their ticket from a hierarchical menu driven interface. This leads to a lot of wrong selection of choice by end user. This in turn causes tickets to be raised in wrong bucket and delays the resolutions due to re-assignments.
We have built a system which accepts a Natural language text input from end user and automatically classifies the ticket in the right category. Key challenges in building such a system are the inaccurate training corpus, many closely separated classes, imbalanced classes, many classes with too less data and a need to handle a large number of requests.
The challenges faced and technical steps needed to build such a system are described. Presentation will take you through the evolution of the system solution, Limitations of solution at each stage, Accuracy achieved across different stage, and WHat improvement in solution architecture is added at each of the stages.
Pre-requisite knowledge of machine learning and classification will be helpful
Arthi Venkataraman has 19+ years of experience in the design, development and testing of projects in different domains • She is currently a Senior Member in the Distinguished Members of Technical Staff cadre at Wipro Technologies • Her current role involves solution development for different business problems in the technology area of Natural Language Processing, Machine Learning and Semantics Technologies
She has a B.E Degree in Computer Science from University Visvesvariah College of Engineering, Bangalore and an MBA (PGDSM) from IIM, Bangalore. She has previously presented papers and spoken at other international conferences This presentation is based on Arthi’s experience in area of building a large scale production grade classifier using Python at her organization