Doing Data Science on Cloud

This submission has been added to the schedule

Doing Data Science on Cloud

Submitted Nov 20, 2017

Section: Full talk Technical level: Advanced

With the increase in data size for running DS models,it is important to look into possible infrastructure options which provide enough scalability to run DS algo successfully.Optimal use of infrastructure in terms of cost is the need of hour.For example,running task using multiple GPU for finite amount of time.
A discussion around a generic infrastructure.

Almost all the Cloud vendors(AWS,Google,Microsoft) provide different kind of services for this situation.This talk will primarily make a comparison into advantages and disadvantes of such services provided by cloud providers.It will also look into various options for running tasks in a particular cloud provider.A discussion of MLaaS services.

In Short, answers to following questions will be addressed-

For generic Infrstructure on Cloud

How to support an altogether different flavour of DS as well as non DS job on a Cloud Vendor?
Constructing a numpy file
Running spark jobs for transformations
Any new hypothetical task on any new technology
How to work with different versions of languages supported out of the box?
How to have an auto scalable infrastructure which is cost effective?
How to have a cloud vendor independent deployment for you DS jobs?

For MLaaS services-

How can we install a library which is not pre installed?
How to use custom hardware resources?

Outline

Importance of running DS on Cloud
Introduction to MLaaS
Demo: Running DS models using Tensor Flow and keras on Google Cloud ML(Using GPUs).
Doing data science using workbenches - Sense.io, Domino data lab,Google Datalab
Demo: Running a simple model on Google Datalab
Cognitive API’s
Demo: Predicting an image using Google Vision API using REST calls.
Discussion on how to develop a generic infinitely scalable infrastructure - Why?What?How?
Demo: Running multiple R jobs to show auto scaling feature of the infrastructure.

Speaker bio

Swapnil is right now contributing to Schlumberger Data Science team applying analytics in field of Oil and Natural Gas.Prior to this he was part of Snapdeal Realtime Analytics team as Lead Enginner. Swapnil in the past has worked as Cloudera Trainer.He belives in learning and sharing his learning across the community.A frequent speaker in meetups and active presenter in conferences.
With more than 8+ years of experience, Swapnil has contributed in Domains of BFSI,Ad Serving and eCommerce with Hadoop,Spark and GCP as primary tech stack.
Past conferences & Meetups:
https://anthillinside.talkfunnel.com/2017-miniconf-pune/15-doing-data-science-on-cloud
https://expert-talks.in/
https://fifthelephant.talkfunnel.com/pune-meetup-2017/3-time-processing-and-watermarks-using-google-pub-su
http://www.bigdatainnovation.org/delhi/2015/India_Bigdata_Week/speakers
Dr Dobbs conference-Bangalore- April 11-12,2014

Slides

https://docs.google.com/presentation/d/e/2PACX-1vTn4_oyB59km-kcRuD1heGFBIexVW7N-UlDb62f7n_62NQvpgQdfV7eym4oq69kUU0-Gswr1kmfRQLO/pub?start=false&loop=false&delayms=3000

Miniconf on Cloud Server Management (Mumbai)