The Fifth Elephant 2018

The seventh edition of India's best data conference

Up next

Business analytics on the cloud - a scalable model with R

PC

Praveen Chandrasekharan

@pchandra

“R” is a great language for data analysis which analysts love, but inherently difficult to scale because of its single threaded nature and lack of libaries/web frameworks. This talk is about how we overcame/worked around the limitations to plug R into a scalable cloud platform. It also talks about other design considerations which makes it practical to do analytics with larger datasets on a cloud paltform with point-and-click execution of functions

Outline

  1. Problem Statement : How do we build an analytics solutioning platform on the cloud with an R backend. Also how can we leverage pre-built R functions to enable point-and-click function execution over the browser, with the platform hosted on the cloud. The challenges include overcoming single threadedness of R, building efficiency in enabling point click analytic function execution on datasets on the browser and showing results to users, all in a performant manner
  2. Existing solutions/drawbacks : Microsoft R, Shiny etc
  3. Factors which influenced the solutioning : Preloading functions and horizontal scalability
  4. Queue based architecture with diagram
  5. Building message queue client in R
  6. Point Click Function execution details
  7. Preload functions and writing an orchestrator in R
  8. Input and output file delivery : Using cloud storage (like Azure File Store) mounted as local drive of R servers as well as Nginx web servers for output handling
  9. Big Data Processing using SparkR : Different path of SparkR clusters based on functions and data size
  10. Efficient mechanisms for showing datasets on the browser
  11. Wrapping Up : How above design considerations have helped achieve running analytics using R on the cloud over the browser

Requirements

https://www.youtube.com/edit?o=U&video_id=n8NlwkAyj5M

Speaker bio

I am sharing my experiences of building a cloud platform which was able to successfully address challenges like scaling R and processing large datasets on the cloud

Linkedin Profile : www.linkedin.com/in/praveencpillai

Slides

https://docs.google.com/presentation/d/1Vml_k_OXYo3Vp6Cby6PpeKhzDNlUrA3Z6IffH_pzHI4/edit#slide=id.p

Comments