The Fifth Elephant 2014

A conference on big data and analytics

Manisha Sethi

@manishasethi

Developing Real-Time Data Pipelines with Apache Kafka

Submitted Mar 27, 2014

The audience would be benefitted in terms of understanding “A High-throughput distributed Messaging system”- KAFKA, which is developed used at Linkedin.

Audience will be understanding :

What is Apache Kafka ,
What Problem Apache Kafka Solves,
Brief overview about its components,
Its High-throughput and Durable data persistence System ,
Sample Use cases,
Comparison with existing solutions,
API overview,
Kafka powered Solutions,
Q&A.

KAFKA can be in conjunction with realtime computaion systems like Storm can help us to scale at millions of records processing per second.

In nutshell, Audience will be able to understand the scenarios where kafka can be plugged in the architecture where its competitors like JMS, flume ,scribe are limiting.

kafka features of Compression and log compaction can be useful for many participants worried about network bandwidth and disk space.

Outline

The Session will have an overiew , concepts , Architecture Details of KAFKA.
Where to fit it, the benefits and features.
API discussion and a Simple Demo or application.
And The Support for Kafka from other products for integration, deployement and monitoring.

Requirements

A standard VM with JAVA >1.6 and and editor like eclipse or any preferred one.

Speaker bio

I Manisha Sethi,have been working in BigData technologies like Hadoop , YARN and NoSQL DBs for many years. With three years of experience i have got the opportuninty to work on kafka in AWS as well to handle TB,s of DATA among various DC’s. And I have also developed applications on kafka with Storma and Cassandra for real time Data Processing.
Currently Working with GODATADRIVEN- The Cloudera partners.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures