The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Democratising Data in the Microservices World

Submitted by Rajaram Mallya (@rajarammallya) on Saturday, 10 June 2017

videocam_off

Technical level

Intermediate

Section

Full talk for data engineering track

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +4

Abstract

In the new world of microservices, every service lives independently with its own databases. But then, they still need data from other microservices to function. It becomes harder and harder for running any kind of analytics or data science on all this fragmented data.
In this democratic, decentralized world how do you empower microservices teams to build their own data pipelines? How do you enable numerous small teams to aggregate and transform real time streams of data? And what about frequently changing schemas of all the other microservices a team uses? How do we make this possible without having evey team learn the intricate nuances of today’s real time streaming pipelines?

This session addresses how we tackled all these problems by building the right data platform tools in Gojek. How we used Kafka, Protobuf schemas and Flink aggregations to democratise data for other engineering teams within the org. How we abstracted out the nuances and comman pitfalls of data pipelines and allowed microservice teams to concentrate on their own business logic. And how we provided a safe, decentralized environment within Gojek for all teams - from product to data science - to experiment, transform and aggregate data in any form or fashion.

At the end of this session, I hope you will take away some of the lessons we learned on how to decentralize data access, how to empower other engineering teams to consume data and how to provide them the support they need for DIY data pipelines.

Outline

  1. Challenges for data in the microservice world
    a. Fragmentation b. Tracking schema changes c. Discoverability and Accessibility
  2. The Data platform solution
    a. Kafka centric tools b. Protobuf for schema management c. Abstracting flink aggregations
  3. Empowering the Dev Org
    a. Ease of use of the Data platform b.’Open Sourcing’ within the org c. Data Discoverability

Speaker bio

I am a data geek at Gojek, Indonesia’s largest Unicorn. I work in the core data engineering team. My previous stints include an AI startup and a long stint at Thoughtworks.

Comments

  • 1
    Zainab Bawa (@zainabbawa) Reviewer a year ago

    Rajaram, you need to submit a preview video and draft slides for the editorial team to review the proposal. This information is currently missing in your submission.

Login with Twitter or Google to leave a comment