Big Data Analytics with R
Submitted by Neeta Pande (@neetapande) on Tuesday, 23 April 2013
Section: Analytics and Visualization Technical level: Intermediate
An attendee would understand High Performance and Parallel Computing landscape in R. This area in R is undergoing rapid change and objective of this session is to provide insight into various active contributions in this area. In the session, we would also delve deeper into analyzing moderately large data sets which presents huge opportunity today as a solution to "everything in memory" challenge in R without getting into huge infrastructure/software setup or costs.
When we hear about Parallelism and Big Data Processing in R, we think of Grid Computing or Parallel computing with Hadoop or Revolution Analytics which requires infrastructure setup and typically skillset/programming beyond R. These may be required for analyzing really big data sets (terabytes+). However for handling data up to few hundreds of GB, there are packages like ff and bigmemory in R, which can solve large number of use cases without the need of additional memory or hardware setup. These techniques though useful are not very well known and are primary focus of this session.
Neeta Pande, Data Architect, Intuit: Neeta has about13 years of experience in Business Intelligence and Analytics. She has extensive experience architecting and engineering data analytics in BFSI, manufacturing and personal finance domain. Her recent focus area includes usage behavior analysis, real time customer behavior prediction/contextual personalization service platforms and designing scalable and sustainable technology platform for solving big data problems.