The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Aastha Rai

@aastha0304

Streaming for life, universe and everything using Confluent Platform

Submitted Mar 14, 2017

When Kafka came it made streaming and our lives a lot easier. But there were still some gaps to fill, how to validate the schema of events coming in, how to stream data from languages other than java and keep this streaming setup central, can we use Kafka to stream for tables and vice-versa, and more. Confluent Platform(CP) is a one-stop centre for all our streaming needs. It is built on top of Kafka and provides components for schema validation, REST API, kafka connectors for multiple data sources and a stream processor as well. In its enterprise solution it provides monitoring support and XDCR. This talk will focus on the open-source components only.
All audience interested in analytics, data engineering or building data-pipelines can leverage CP for their streaming requirements.

Outline

1)What is Confluent Platform and why we need it?
2)Schemas and their importance
3)How Schema Registry (central schema validation server from CP) works?
4)How Rest-Proxy (CP’s REST API server) can be used to produce and consume events?
5)How to leverage Kafka-connect to replicate data, eg case from Mysql to PGSQL (even works with enterprise solutionslike Aurora to Redshift)
6)The principles and components of Kafka Stream (CP’s stream processor)

Speaker bio

I am a data engineer working in real time fields like mobile advertising and fantasy sports. Over the last 4 years I have used and enjoyed the Kafka ecosystem extensively and currently am using Confluent Platform for consuming/processing all business, monitoring, analytics and logging events in my organization.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures