Anthill Inside 2019

On infrastructure for AI and ML: from managing training data to data storage, cloud strategy and costs of developing ML models

Advanced NLP and Deep Learning for document classification - A case study in civil aviation safety prognosis

Submitted by prabhakar srinivasan (@prabhacar7) on Wednesday, 1 May 2019

Section: Tutorials Technical level: Intermediate Session type: Tutorial


In this presentation, I apply a set of data-mining and sequential deep learning techniques to accident reports published by the National Transportation Safety Board (NTSB), in order to support real-time prognosis of adverse events. The focus here is on learning with text data that describes sequences of events. NTSB creates post-hoc investigation reports which contain raw text narratives of their investigation and their corresponding concise event sequences. Classification models are developed for Class A passenger air carriers, that take either an observed sequence of events or the corresponding raw text narrative as input and make predictions regarding whether an accident or an incident is the likely outcome, whether the aircraft would be damaged or not and whether any fatalities are likely or not.


Sequential models for NLP are gaining popularity and this presentation talks about a case-study of applying these techniques to solve real problems for the Civil Aviation in the US. The classification models are developed using Word Embedding and the Long Short-term Memory (LSTM) algorithm. The proposed methodology is implemented in two steps: (i) transform the NTSB data extracts into labeled datasets for building supervised machine learning models; and (ii) develop DL models for doing prognosis of adverse events like accidents, aircraft damage or fatalities. We also develop a prototype for an interactive query interface for end-users to test various scenarios including complete or partial event sequences or narratives and get predictions regarding the adverse events. The presentation is accompanied by a demo component and the resulting F1-score metrics are used to evaluate the effectiveness of the technique. Audience will gain in-depth insight into the technology stack used for this deep learning application and the ways to troubleshoot the usual problems of noise in natural language.

Speaker bio

Prabhakar Srinivasan as obtained a Masters in Computer Science from DePaul university, Chicago and has over 13 years industry experience working for companies like Apple, Yahoo!, and Cisco and start-ups like Coffeemeetsbagel. With a breath of experience in developing Enterprise-scale applications like Recommendation Engines and Deep-Learning applications for Forecasting Sales and Demand Prediction in Supply Chain, the author has in-depth knowledge of the tools and technologies used for developing pragmatic machine learning applications.



  • Abhishek Balaji (@booleanbalaji) Reviewer a month ago

    Hi Prabhakar,

    Thank you for submitting a proposal. The content looks really interesting for a tutorial at Anthill Inside 2019 ( rather than a talk, where you’ll have a more inimate audience who can follow along. For us to consider your proposal for evaluation, we need to see slides and a preview video. Your slides must cover the following:

    • Problem statement/context, which the audience can relate to and understand. The problem statement has to be a problem (based on this context) that can be generalized for all.
    • What were the tools/options available in the market to solve this problem? How did you evaluate these, and what metrics did you use for the evaluation?
    • Why did you pick the option that you did?
    • Explain how the situation was before the solution you picked/built and after implementing the solution you picked and built? Show before-after scenario comparisons & metrics.
    • What compromises/trade-offs did you have to make in this process?
    • What are the privacy, regulatory and ethical considerations when building this solution?
    • What is the one takeaway that you want participants to go back with at the end of this talk? What is it that participants should learn/be cautious about when solving similar problems?

    As next steps, we’d need to see the detailed and/or updated slides by 21 May, in order to close the decision on your proposal. If we dont receive an update by 21 May, we’d have to move the proposal for consideration for a future conference.

Login with Twitter or Google to leave a comment