Feb 2017
13 Mon
14 Tue
15 Wed
16 Thu 09:00 AM – 06:00 PM IST
17 Fri 09:00 AM – 06:00 PM IST
18 Sat
19 Sun
Gaurav Godhwani
Indian Budget documents across various tiers of government, consist of detailed information on allocations made and resources raised in a financial year. Unfortunately these documents are published in messy PDF formats which makes it difficult for researchers, economists and general public to analyse and use this crucial data. This session will delve into how we can create a data pipeline and leverage computer vision techniques to parse these documents into clean machine-readable formats, using some popular python libraries(like PyPDF2, OpenCV, numpy, etc) along with other open-source tools like Tabula, CKAN.
###What’s in for you?
Building data pipelines for civic-engagement is still in its embryonic stage in India, this talk will give an opportunity to data enthusiasts to learn, produce and contribute to open data in their geographies. People will also explore how we can employ simple python scripts and open-source tools to deal with complex multifarious data formats.
The talk will be organized as:
Knowledge of Python 2.7, acquaintance with basic data mining
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}