The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Streaming for life, universe and everything using Confluent Platform

Submitted by Aastha Rai (@aastha0304) on Tuesday, 14 March 2017

Section: Crisp talk for data engineering track Technical level: Intermediate Status: Rejected


When Kafka came it made streaming and our lives a lot easier. But there were still some gaps to fill, how to validate the schema of events coming in, how to stream data from languages other than java and keep this streaming setup central, can we use Kafka to stream for tables and vice-versa, and more. Confluent Platform(CP) is a one-stop centre for all our streaming needs. It is built on top of Kafka and provides components for schema validation, REST API, kafka connectors for multiple data sources and a stream processor as well. In its enterprise solution it provides monitoring support and XDCR. This talk will focus on the open-source components only.
All audience interested in analytics, data engineering or building data-pipelines can leverage CP for their streaming requirements.


1)What is Confluent Platform and why we need it?
2)Schemas and their importance
3)How Schema Registry (central schema validation server from CP) works?
4)How Rest-Proxy (CP’s REST API server) can be used to produce and consume events?
5)How to leverage Kafka-connect to replicate data, eg case from Mysql to PGSQL (even works with enterprise solutionslike Aurora to Redshift)
6)The principles and components of Kafka Stream (CP’s stream processor)

Speaker bio

I am a data engineer working in real time fields like mobile advertising and fantasy sports. Over the last 4 years I have used and enjoyed the Kafka ecosystem extensively and currently am using Confluent Platform for consuming/processing all business, monitoring, analytics and logging events in my organization.


  • Sandhya Ramesh (@sandhyaramesh) 3 years ago

    Hi Aastha, we’re evaluating proposals currently. Could you upload your slide deck and a two minute video of you walking us through your talk? Thanks!

  • Zainab Bawa (@zainabbawa) 3 years ago

    Aastha, couple of things you need to clarify in the abstract and draft slides:
    1. What exactly is your use case for which the Confluent Platform (CP) platform works?
    2. Did you evaluate any other options before choosing Confluent Platform (CP)? Why was this option better than others? What criteria did you use for evaluation?
    3. What is the big picture or larger insight that you want to communicate with others in the audience who are building their own data pipelines and platforms? Surely, using Confluent Platform (CP) isn’t the only message you want to drive home. Is a bigger picture to be shared in terms of how you decide on tools and options when building data pipelines?

  • Govind Kanshi (@govindsk) 3 years ago

    Kindly request to share
    - Why Kafka to replicate data from mysql to pgsql (low latency, large volume, restriction of amount of data -amount per message, challenges and patterns to overcome issues of datatype mismatch ) -
    - Request to tie up various use cases in terms of solution that was build for say stream processor, schema registry and then delve deeper into bits which may not be obvious to most and best practices.

  • Vinayak Hegde (@vin) 2 years ago

    It is hard to comment without the slide deck or more details. Based on the summary, here are a few comments
    1. What are the alternatives to Kafka. Why is Kafka better or worse ? What use-cases work best with Kafka/Confluent ?
    2. How does the Streams API you mentioned differ from earlier abstractions ? Why is it better ?
    3. How does this integrate with other parts of the Confluent/Kafka ecosystem like Samza ?
    4. Why use Kafka to replicate from Mysql to PostgreSQL ? Are you talking about bottled-water plugin ? What are the challenges in doing so ? What is the motivation for using this solutions and what are the tradeoffs ?

Login to leave a comment