The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Rajaram Mallya

@rajarammallya

Democratising Data in the Microservices World

Submitted Jun 10, 2017

In the new world of microservices, every service lives independently with its own databases. But then, they still need data from other microservices to function. It becomes harder and harder for running any kind of analytics or data science on all this fragmented data.
In this democratic, decentralized world how do you empower microservices teams to build their own data pipelines? How do you enable numerous small teams to aggregate and transform real time streams of data? And what about frequently changing schemas of all the other microservices a team uses? How do we make this possible without having evey team learn the intricate nuances of today’s real time streaming pipelines?

This session addresses how we tackled all these problems by building the right data platform tools in Gojek. How we used Kafka, Protobuf schemas and Flink aggregations to democratise data for other engineering teams within the org. How we abstracted out the nuances and comman pitfalls of data pipelines and allowed microservice teams to concentrate on their own business logic. And how we provided a safe, decentralized environment within Gojek for all teams - from product to data science - to experiment, transform and aggregate data in any form or fashion.

At the end of this session, I hope you will take away some of the lessons we learned on how to decentralize data access, how to empower other engineering teams to consume data and how to provide them the support they need for DIY data pipelines.

Outline

  1. Challenges for data in the microservice world
    a. Fragmentation
    b. Tracking schema changes
    c. Discoverability and Accessibility
  2. The Data platform solution
    a. Kafka centric tools
    b. Protobuf for schema management
    c. Abstracting flink aggregations
  3. Empowering the Dev Org
    a. Ease of use of the Data platform
    b.‘Open Sourcing’ within the org
    c. Data Discoverability

Speaker bio

I am a data geek at Gojek, Indonesia’s largest Unicorn. I work in the core data engineering team. My previous stints include an AI startup and a long stint at Thoughtworks.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures