Big Data, Real-time Processing and Storm
Submitted by Prashanth Babu (@p7h) on Thursday, 28 March 2013
Section: Workshops Technical level: Beginner
Participants will learn:
- And understand concepts and salient features of Storm.
- How Storm can be used for processing Big Data and in real-time.
- Storm through a simple example.
- Storm and Hadoop.
- A case study: Real-time analysis of tweets using Storm.
Hadoop is predominantly for batch processing. Did you ever wonder how to process Big Data in real-time? If yes, this workshop is for you.
To give an example, trends in Twitter are powered by Storm; Tweets are analyzed in real-time to find the trending topics / hashtags using Storm.
This workshop will introduce the basics of Storm and its salient features. We will discuss how Storm is similar / different from Hadoop. We will also run through the source of WordCount example and its demo. And finally we will discuss how Hadoop and Storm together can help process Big Data seamlessly.
If time permits, we will also check a simple demo of real-time processing of tweets using Storm.
Brief outline of the session has been uploaded to Slideshare, which is also embedded in slides section below.
Please check the slidedeck and let me know if you have any feedback and / or comments on the outline of the workshop.
Note: For this session, we will be using Storm Local Mode for developing and testing the code. So, any laptop with JDK and Maven should suffice.
- Basic understanding of Java.
- Worked on reasonably big chunks of data.
- Hadoop and MapReduce knowledge is good-to-have, but not mandatory.
- Laptop with latest Oracle JDK 7.0.x and Apache Maven 3.0.x installed.
- Internet connectivity.
- Twitter App:
- Participants need to create a Twitter app with read-only access on Twitter Developer portal before this session.
- Please keep the Consumer key, Consumer secret, Access token and Access token secret of this app handy.
- We will be utilizing these credentials for retrieving tweets using Twitter4J in our code.
Prashanth Babu is a Research Engineer with NTT DATA. He is working on an R & D initiative on Big Data using Apache Hadoop Ecosystem. He is also Cloudera Certified Developer for Apache Hadoop [CCDH].