Submit a talk on data

Submit a talk on data

Submit talks on data engineering, data science, machine learning, big data and analytics through the year – 2019

##This space is open for submitting proposals on data engineering, data science, machine learning, big data and analytics through the year in 2019.

We will host data events round the year, in 2019. Talks for these conferences will be selected from here. Submit a proposal any time.

##Should you have queries, write to us on fifthelephant.editorial@hasgeek.com or call us on 7676332020

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Akash Tandon

@akashtandon

Making sense of messy data to track disease outbreaks in India

Submitted Apr 9, 2018

In spite of open data portals cropping up across multiple domains, working with the datasets they provide is difficult. In our bid to identify disease outbreaks and aid preventive health-care, we came across one such data source.

The Ministry of Health and Family Welfare (MoHFW) in India has the IDSP scheme in place to identify disease outbreaks at sub-district & village level across India. Under this scheme, it releases weekly outbreak data as a PDF document. PDFs are notorious for being hard to parse and incorporate in data science workflows. We’ll outline how we leverage Python/R based open source solutions including Apache Airflow and in-house tools to structure this data in order to derive useful insights from it.

Outline

  • Introduction and Background
  • Architecture
    • Generalizing the DAG creation work-flow on Airflow
    • Getting PDFs from IDSP website
    • Extracting data out of PDFs
    • Data wrangling using Python and R
    • Geography Identification
    • Insights and alert generation
  • Demo (with code snippets)

Speaker bio

Akash Tandon is a member of the data engineering team at SocialCops where he’s primary maintainer for their geography identification and entity recognition system. He also contributes to multiple components of the data pipeline.
Prior to this, he was a data engineer at RedCarpetUp. In the past, he had participated in the Google Summer of Code program as a student and mentor.

Slides

https://speakerdeck.com/analyticalmonk/making-sense-of-messy-data-to-track-disease-outbreaks-in-india-fifth-elephant-2018

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more