The Fifth Elephant 2014

A conference on big data and analytics

Swapnil Dubey

Swapnil Dubey

@swapnildubey1984

Advanced Big Data Analytics using Apache Mahout and Giraph

Submitted Jun 2, 2014

It is difficult to address Graph and machine learning problems using the MapReduce framework. Mostly these problems need multiple iterations of complex algorithms, which can be a little tricky and diffciult to implement in MapReduce. However, there are two frameworks available to address such problems I.e graph and machine learning problems in the Hadoop ecosystem. Apache Giraph is a graph-processing framework designed to run large-scale algorithms. Apache Mahout is a framework that provides implementations of distributed machine learning algorithms.

Outline

In the session we would be discussing foillowing topics:-
PageRank with Apache Giraph,
Single-source shortest-path with Apache Giraph,
Using Apache Giraph to perform a distributed breadth-first search,
Collaborative filtering with Apache Mahout,
Clustering with Apache Mahout,
Sentiment classification with Apache Mahout.

After the session the audience would be having basic idea of Mahout and Apche Giraph.Along with it, they would be able to implement the basic concepts of machine learning.

Requirements

Hadoop environment running in pseudo distributed mode.

Speaker bio

Swapnil Dubey is a Big Data Hacker and Trainer at GoDataDriven. He has close to 5.5 years of experience in the Software Industry.He is a Cloudera Certified Hadoop developer.He has worked on several projects in the BFSI and E-commerce domain. Worked extensively in the data quality domain using Java/J2ee and Hadoop as the primary technology. He has an inclination to open source technologies and likes to explore/delve into new framework.

Swapnil also has the experience of presenting at various conferences at national level on Bigdata.Recently presented two topics at DrDoobsConference(April,2014) Banglore.

  1. Apache Flume
  2. Solving complex problems with hadoop

Slides

https://docs.google.com/presentation/d/1OrGePeNFVXY2OG-vUrsjLxtptukoJ8z5_-jeIpq2vl4/edit#slide=id.g34e085156_012

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures