The Fifth Elephant 2014

A conference on big data and analytics

Advanced Big Data Analytics using Apache Mahout and Giraph

Submitted by swapnil dubey (@swapnildubey1984) on Monday, 2 June 2014

Section: Workshops Technical level: Advanced


It is difficult to address Graph and machine learning problems using the MapReduce framework. Mostly these problems need multiple iterations of complex algorithms, which can be a little tricky and diffciult to implement in MapReduce. However, there are two frameworks available to address such problems I.e graph and machine learning problems in the Hadoop ecosystem. Apache Giraph is a graph-processing framework designed to run large-scale algorithms. Apache Mahout is a framework that provides implementations of distributed machine learning algorithms.


In the session we would be discussing foillowing topics:-
PageRank with Apache Giraph, Single-source shortest-path with Apache Giraph, Using Apache Giraph to perform a distributed breadth-first search, Collaborative filtering with Apache Mahout, Clustering with Apache Mahout, Sentiment classification with Apache Mahout.

After the session the audience would be having basic idea of Mahout and Apche Giraph.Along with it, they would be able to implement the basic concepts of machine learning.


Hadoop environment running in pseudo distributed mode.

Speaker bio

Swapnil Dubey is a Big Data Hacker and Trainer at GoDataDriven. He has close to 5.5 years of experience in the Software Industry.He is a Cloudera Certified Hadoop developer.He has worked on several projects in the BFSI and E-commerce domain. Worked extensively in the data quality domain using Java/J2ee and Hadoop as the primary technology. He has an inclination to open source technologies and likes to explore/delve into new framework.

Swapnil also has the experience of presenting at various conferences at national level on Bigdata.Recently presented two topics at DrDoobsConference(April,2014) Banglore.

  1. Apache Flume
  2. Solving complex problems with hadoop



  • Rajesh Kumar (@rajeshxebia) 5 years ago

    Way to go.

  • vipul khurmi (@vkhurmi) 5 years ago

    Looking for good hands-on.

  • Sunil Pandey (@sunilp) 5 years ago

    Well written and neatly explained!!

  • Abhishek Sharma (@abhishekonhasgeek) 5 years ago

    Looks really interesting, looking forward to it!! :)

  • Viral B. Shah (@viralbshah) 5 years ago

    Like this talk. Also hoping you can talk about some of the challenges of building with MapReduce (given that we are likely to have a talk on Tez). Also, it would be good to have an application and use case perspective, on where Giraph is deployed and how it is used.

  • swapnil dubey (@swapnildubey1984) Proposer 5 years ago

    @Viral....I will be discussing day to day use cases where graph analysis is used these days like FB and Twitter.Along with it, I will also talk about what exactly is analytics in such scenarios.

    Will surely try to implement the concept using basic MR paradigm.....and then I will try to convince the audience to the solution Giraph as well as Mahout provides.

Login with Twitter or Google to leave a comment