Democratising Data in the Microservices World

Jul 2017

24 Mon

25 Tue

26 Wed

27 Thu 08:15 AM – 10:00 PM IST

28 Fri 08:15 AM – 06:25 PM IST

29 Sat

30 Sun

MLR Convention Centre, Whitefield, Bengaluru,

Democratising Data in the Microservices World

Submitted Jun 10, 2017

Section: Full talk for data engineering track Technical level: Intermediate

In the new world of microservices, every service lives independently with its own databases. But then, they still need data from other microservices to function. It becomes harder and harder for running any kind of analytics or data science on all this fragmented data.
In this democratic, decentralized world how do you empower microservices teams to build their own data pipelines? How do you enable numerous small teams to aggregate and transform real time streams of data? And what about frequently changing schemas of all the other microservices a team uses? How do we make this possible without having evey team learn the intricate nuances of today’s real time streaming pipelines?

This session addresses how we tackled all these problems by building the right data platform tools in Gojek. How we used Kafka, Protobuf schemas and Flink aggregations to democratise data for other engineering teams within the org. How we abstracted out the nuances and comman pitfalls of data pipelines and allowed microservice teams to concentrate on their own business logic. And how we provided a safe, decentralized environment within Gojek for all teams - from product to data science - to experiment, transform and aggregate data in any form or fashion.

At the end of this session, I hope you will take away some of the lessons we learned on how to decentralize data access, how to empower other engineering teams to consume data and how to provide them the support they need for DIY data pipelines.

Outline

Challenges for data in the microservice world
a. Fragmentation
b. Tracking schema changes
c. Discoverability and Accessibility
The Data platform solution
a. Kafka centric tools
b. Protobuf for schema management
c. Abstracting flink aggregations
Empowering the Dev Org
a. Ease of use of the Data platform
b.‘Open Sourcing’ within the org
c. Data Discoverability

Speaker bio

I am a data geek at Gojek, Indonesia’s largest Unicorn. I work in the core data engineering team. My previous stints include an AI startup and a long stint at Thoughtworks.

The Fifth Elephant 2017