Doing Data Science on Cloud
Submitted by Swapnil Dubey (@swapnildubey) on Monday, 20 November 2017
With the increase in data size for running DS models,it is important to look into possible infrastructure options which provide enough scalability to run DS algo successfully.Optimal use of infrastructure in terms of cost is the need of hour.For example,running task using multiple GPU for finite amount of time.
A discussion around a generic infrastructure.
Almost all the Cloud vendors(AWS,Google,Microsoft) provide different kind of services for this situation.This talk will primarily make a comparison into advantages and disadvantes of such services provided by cloud providers.It will also look into various options for running tasks in a particular cloud provider.A discussion of MLaaS services.
In Short, answers to following questions will be addressed-
For generic Infrstructure on Cloud
- How to support an altogether different flavour of DS as well as non DS job on a Cloud Vendor? Constructing a numpy file Running spark jobs for transformations Any new hypothetical task on any new technology - How to work with different versions of languages supported out of the box? - How to have an auto scalable infrastructure which is cost effective? - How to have a cloud vendor independent deployment for you DS jobs?
For MLaaS services-
- How can we install a library which is not pre installed? - How to use custom hardware resources?
Importance of running DS on Cloud
Introduction to MLaaS
Demo: Running DS models using Tensor Flow and keras on Google Cloud ML(Using GPUs).
Doing data science using workbenches - Sense.io, Domino data lab,Google Datalab
Demo: Running a simple model on Google Datalab
Demo: Predicting an image using Google Vision API using REST calls.
Discussion on how to develop a generic infinitely scalable infrastructure - Why?What?How?
Demo: Running multiple R jobs to show auto scaling feature of the infrastructure.
Swapnil is right now contributing to Schlumberger Data Science team applying analytics in field of Oil and Natural Gas.Prior to this he was part of Snapdeal Realtime Analytics team as Lead Enginner. Swapnil in the past has worked as Cloudera Trainer.He belives in learning and sharing his learning across the community.A frequent speaker in meetups and active presenter in conferences.
With more than 8+ years of experience, Swapnil has contributed in Domains of BFSI,Ad Serving and eCommerce with Hadoop,Spark and GCP as primary tech stack.
Past conferences & Meetups:
https://anthillinside.talkfunnel.com/2017-miniconf-pune/15-doing-data-science-on-cloud https://expert-talks.in/ https://fifthelephant.talkfunnel.com/pune-meetup-2017/3-time-processing-and-watermarks-using-google-pub-su http://www.bigdatainnovation.org/delhi/2015/India_Bigdata_Week/speakers Dr Dobbs conference-Bangalore- April 11-12,2014