Jul 2016
25 Mon
26 Tue
27 Wed
28 Thu 08:30 AM – 06:25 PM IST
29 Fri 08:30 AM – 06:15 PM IST
30 Sat 08:45 AM – 05:00 PM IST
31 Sun 08:15 AM – 06:00 PM IST
“In theory, there is no difference between theory and practice. But in practice, there is.” - Yogi Berra
Once the task of prototyping a data science solution has been accomplished on a local machine, the real challenge begins in how to make it work in production. To ensure that the plumbing of the data pipeline will work in production at scale is both an art and a science. The science involves understanding the different tools and technologies needed to make the data pipeline connect, while the art involves making the trade-offs needed to tune the data pipeline so that it flows.
In this workshop, you will learn how to build a scalable data science platform with set up and conduct data engineering using Pandas and Luigi, build a machine learning model with Apache Spark and deploy it as predictive api with Flask
The biggest challenge in building a data science platform is to glue all the moving pieces together. Typically, a data science platform consists of:
Over time the amount of data stored that needs to be processed increases and it necessitates the need to run the Data Science process frequently. But different technologies/stack solve different parts of the Data Science problem. Leaving it to respective teams introduces lag into the system. What is needed is an automated pipeline process - one that can be invoked based on business logic (real time, near-real-time etc) and a guarantee that it will maintain data integrity.
Details of the workshop
We all know that 80% of the effort is spent on data engineering while the rest is spent in building the actual machine learning models. Data engineering starts with identifying the right data sources. Data sources can be databases, third party APIs, HTML documents which needs to be scrapped and so on. Acquiring data from databases is a straight forward job, while acquiring data from third party APIs and scrapping may come with its own complexities like page visit limits, API rate limiting etc. Once we manage to acquire data from all these sources, the next job is to clean the data.
We will be covering the following topics for data engineering:
Building a robust and scalable machine learning platform is a hard job. As the data size increases, the need for more computational capabilities increase. So how do you build a system that can scale by just adding more hardware and not worrying about changing the code too much every time? The answer to that is to use Apache Spark ML. Apache Spark lets us build machine learning platforms by providing distributed computing capabilities out of the box.
We will be covering the following topics for Machine Learning:
It ain’t over until the fat lady sings. Making a system API driven is very essential as it ensures the usage of the built machine learning model , thereby helping other systems integrate the capabilities with ease.
We will be covering the following topics for API:
Pre-Requisites:
Speaker Bio:
Nischal is co founder and Data Engineer at Unnati Data Labs who enables the Data Scientists to work at peace. He makes sure that they get the data they need and in the way they need it. Previously he has built, from scratch, various systems for E-commerce like catalog management, recommendation engines and market basket analysis to name a few during his tenure at Redmart.
Raghotham is a co founder Data Scientist at Unnati Data Labs, who can work across the complete stack. Previously, at Touchpoints Inc., He single handedly built a data analytics platform for a fitness wearable company. With Redmart, he worked on the CRM system and has built a sentinment analyzer for Redmart’s Social Media. Prior to Redmart and Touchpoints, Raghotham worked at SAP Labs where he was a core part of what is currently SAP’s framework for building web and mobile products. He was a part of multiple SAP wide events helping to spread the knowledge both internally and to customers.
They have conducted workshops in the field of Deep learning across the world. They are strong believers of open source and love to architect big, fast and reliable systems.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}