The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Up next

Schema Registry and the nitty gritty details of schema formats


Shivji Kumar Jha


The data ecosystem has come along way in last decade. The ride from structured to unstructured data has been quick. And kafka (more genrally the streaming ecosystem) has been at the forefront of that innovation. While the streaming architecture started with bits (== data - semantics) flowing through the network to offer flexibity the structure and semantics has caught up rather quickly. The same is evident by confluent’s schema registry for kafka and schema registry being shipped with Apache pulsar etc.

The schema representation though is a vast topic and with multiple topics (json-schemas, avro, protobuf, thrift) its difficult to really understand whats best for you. This talk will present this formats and how they actually work under the hood encoding and decoding data, how the schema evolves over time relating to all these formats etc


This talk is mostly centered around:
1. The most popular schema representation formats.
2. How these formats actually encode and decode data on producer/consumer ends.
3. Why we chose Avro for our pipeline
4. What are the important factors in choosing one of these.

Speaker bio

Shiv is a passionate engineer who loves building scalable, fault-tolerant & highly available platforms. Shiv has contributed to multiple open source projects including apache pulsar, mysql, apache atlas etc. Shiv has worked on a variety of products ranging from backend platforms to infra to web applications and loves collaborating with people sharing and gathering knowledge through the open source community. Shiv has previously been a speaker at multiple open source conferences including FOSS ASIA, OPEN SOURCE INDIA etc.