Advanced Big Data Analytics using Apache Mahout and Giraph
It is difficult to address Graph and machine learning problems using the MapReduce framework. Mostly these problems need multiple iterations of complex algorithms, which can be a little tricky and diffciult to implement in MapReduce. However, there are two frameworks available to address such problems I.e graph and machine learning problems in the Hadoop ecosystem. Apache Giraph is a graph-processing framework designed to run large-scale algorithms. Apache Mahout is a framework that provides implementations of distributed machine learning algorithms.
In the session we would be discussing foillowing topics:-
PageRank with Apache Giraph,
Single-source shortest-path with Apache Giraph,
Using Apache Giraph to perform a distributed breadth-first search,
Collaborative filtering with Apache Mahout,
Clustering with Apache Mahout,
Sentiment classification with Apache Mahout.
After the session the audience would be having basic idea of Mahout and Apche Giraph.Along with it, they would be able to implement the basic concepts of machine learning.
Hadoop environment running in pseudo distributed mode.
Swapnil Dubey is a Big Data Hacker and Trainer at GoDataDriven. He has close to 5.5 years of experience in the Software Industry.He is a Cloudera Certified Hadoop developer.He has worked on several projects in the BFSI and E-commerce domain. Worked extensively in the data quality domain using Java/J2ee and Hadoop as the primary technology. He has an inclination to open source technologies and likes to explore/delve into new framework.
Swapnil also has the experience of presenting at various conferences at national level on Bigdata.Recently presented two topics at DrDoobsConference(April,2014) Banglore.
- Apache Flume
- Solving complex problems with hadoop