“R” is a great language for data analysis which analysts love, but inherently difficult to scale because of its single threaded nature and lack of libaries/web frameworks. This talk is about how we overcame/worked around the limitations to plug R into a scalable cloud platform. It also talks about other design considerations which makes it practical to do analytics with larger datasets on a cloud paltform with point-and-click execution of functions
- Problem Statement : How do we build an analytics solutioning platform on the cloud with an R backend. Also how can we leverage pre-built R functions to enable point-and-click function execution over the browser, with the platform hosted on the cloud. The challenges include overcoming single threadedness of R, building efficiency in enabling point click analytic function execution on datasets on the browser and showing results to users, all in a performant manner
- Existing solutions/drawbacks : Microsoft R, Shiny etc
- Factors which influenced the solutioning : Preloading functions and horizontal scalability
- Queue based architecture with diagram
- Building message queue client in R
- Point Click Function execution details
- Preload functions and writing an orchestrator in R
- Input and output file delivery : Using cloud storage (like Azure File Store) mounted as local drive of R servers as well as Nginx web servers for output handling
- Big Data Processing using SparkR : Different path of SparkR clusters based on functions and data size
- Efficient mechanisms for showing datasets on the browser
- Wrapping Up : How above design considerations have helped achieve running analytics using R on the cloud over the browser
https://www.youtube.com/edit?o=U&video_id=n8NlwkAyj5M
I am sharing my experiences of building a cloud platform which was able to successfully address challenges like scaling R and processing large datasets on the cloud
Linkedin Profile : www.linkedin.com/in/praveencpillai
https://docs.google.com/presentation/d/1Vml_k_OXYo3Vp6Cby6PpeKhzDNlUrA3Z6IffH_pzHI4/edit#slide=id.p
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}