The Fifth Elephant 2013

An Event on Big Data and Cloud Computing

Neeta Pande

@neetapande

Big Data Analytics with R

Submitted Apr 23, 2013

An attendee would understand High Performance and Parallel Computing landscape in R. This area in R is undergoing rapid change and objective of this session is to provide insight into various active contributions in this area. In the session, we would also delve deeper into analyzing moderately large data sets which presents huge opportunity today as a solution to “everything in memory” challenge in R without getting into huge infrastructure/software setup or costs.

Outline

When we hear about Parallelism and Big Data Processing in R, we think of Grid Computing or Parallel computing with Hadoop or Revolution Analytics which requires infrastructure setup and typically skillset/programming beyond R. These may be required for analyzing really big data sets (terabytes+). However for handling data up to few hundreds of GB, there are packages like ff and bigmemory in R, which can solve large number of use cases without the need of additional memory or hardware setup. These techniques though useful are not very well known and are primary focus of this session.

Speaker bio

Neeta Pande, Data Architect, Intuit: Neeta has about13 years of experience in Business Intelligence and Analytics. She has extensive experience architecting and engineering data analytics in BFSI, manufacturing and personal finance domain. Her recent focus area includes usage behavior analysis, real time customer behavior prediction/contextual personalization service platforms and designing scalable and sustainable technology platform for solving big data problems.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures