The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Tickets

Schema Registry and the nitty gritty details of schema formats

Submitted by Shivji Kumar Jha (@shiv4289) on Saturday, 30 March 2019

Abstract

The data ecosystem has come along way in last decade. The ride from structured to unstructured data has been quick. And kafka (more genrally the streaming ecosystem) has been at the forefront of that innovation. While the streaming architecture started with bits (== data - semantics) flowing through the network to offer flexibity the structure and semantics has caught up rather quickly. The same is evident by confluent’s schema registry for kafka and schema registry being shipped with Apache pulsar etc.

The schema representation though is a vast topic and with multiple topics (json-schemas, avro, protobuf, thrift) its difficult to really understand whats best for you. This talk will present this formats and how they actually work under the hood encoding and decoding data, how the schema evolves over time relating to all these formats etc

Outline

This talk is mostly centered around:
1. The most popular schema representation formats.
2. How these formats actually encode and decode data on producer/consumer ends.
3. Why we chose Avro for our pipeline
4. What are the important factors in choosing one of these.

Speaker bio

Shiv is a passionate engineer who loves building scalable, fault-tolerant & highly available platforms. Shiv has contributed to multiple open source projects including apache pulsar, mysql, apache atlas etc. Shiv has worked on a variety of products ranging from backend platforms to infra to web applications and loves collaborating with people sharing and gathering knowledge through the open source community. Shiv has previously been a speaker at multiple open source conferences including FOSS ASIA, OPEN SOURCE INDIA etc.

Links

Comments

  • Anwesha Sarkar (@anweshaalt) Reviewer 6 months ago

    Thank you for submitting the proposal. Submit your slides and preview video by 20th April (latest) it helps us to close the review process.

  • Zainab Bawa (@zainabbawa) Reviewer 5 months ago

    Here are some comments on your proposal, Shivji:

    1. It will help to define the audience for this proposal, Shivji. You have submitted this for Rootconf too, and we’d have assumed that this proposal will be a better fit for Rootconf.
    2. Defining the audience – as in who should listen to the talk – will also help you to frame the context for this talk. Is the context for this talk a pitch about Avro or is it about the nitty gritty details of schema formats? You have to decide on one of the two.
    3. If the pitch is for Avro, then you have to structure this as an experience talk, explaining what was the problem for which you chose Avro, why Avro, which other choices did you evaluate and what were the evaluation criteria. You will also have to explain the before-after scenarios, showing how was the situation before Avro and what is it after Avro?
    4. Also, you have to explain the details about your pipeline and why Avro fit best for your use case. Are there are pipeline architecture and design patterns where Avro doesn’t fit? This will be useful for the audience to know.
    5. Whereas, if your talk is about the nitty gritty details of schema formats, then you have to figure out how you will frame the problem statment – what is the problem about schema formats which is not grasped otherwise and therefore, what is it that participants need to know?

    We’ll need to see draft slides – by 27 May – which help us understand your thinking to assess the fit of your proposal for The Fifth Elephant. Since this proposal has been submitted a while ago with no further updates, you also have to let us know if your plans have changed and if you want us to move your proposal to future editions of Rootconf and/or The Fifth Elephant.

Login with Twitter or Google to leave a comment