The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Niraj Mishra

Ephemeral data pipelines using Atlassian’s Lithium platform

Submitted Jun 3, 2024

There are numerous use cases that require moving large amounts of data between different systems and validating and transforming them in-flight. Platforms such as Apache Flink can be excellent choices for moving and transforming data at scale - effectively through streaming ETL. However, certain use cases within Atlassian ‘onprem to cloud data migration’, ‘cloud to cloud data migration’, ‘backup and restore’, and 'sandbox’ present unique challenges. They require stream processing pipelines to be entirely provisioned at runtime, including dedicated Kafka topics, parallelism, selection of stream processors with appropriate and available compute resources. Stream processors must be embedded directly in product and platform services to enable in-process access to service context in order to meet functional requirements along with throughput goals. In addition, our pipelines require coordination amongst sources, transforms and sinks that exist in different product and infrastructure services. This led us to build the Lithium Platform. It is 100% event driven and is built on Kafka and Kafka Streams.

Lithium is a platform that leverages Kafka Streaming technology based ETL pipelines to enable rapid and scalable data movement. It is designed for seamlessly transferring data between different points with maximum efficiency and reliability. It boasts of advanced capabilities such as decoupled ETL components, highly customizable job configuration options, in-progress remediation capabilities, continuous job monitoring, etc.

Outline

Introduction to Lithium Platform

We will introduce the Lithium platform and provide an overview of why it was built in Atlassian and what kind of use cases it is solving currently. Introduce to the concept of ephemeral data pipelines, a.k.a Workplans

Lithium Control plane

Introduce the Lithium control plane to the audience and discuss various components within Control Plane and what is it responsible for in managing Data Pipelines

Lithium Data Plane

Lithium Data Plane consists of the components responsible for moving the data through the data pipeline. We will go over various components of a Data Plane. We will also cover how we have designed Self Hosted Data Plane components, OR BYOH (Bring your own host) with the help of Lithium SDKs

Control plane and data plane communication

We will discuss how the Control plane and Data Plane communicates with each other. What kinds to events are exchanged between these systems to keep the Data Pipelines up and running

Custom source, transform, validation, sink, and workplan state processors

Lithium provides simple APIs through the data plane SDKs to write custom data plane processors (extract, transform, load etc). These processors together make the Data Plane and run the custom logic within the Data Pipelines.

Dynamic Resource Provisioning in the Data Plane and the Auctioning Model

Lithium Data plane implements an Auctioning model to provision compute resources required for a particular job and all the resources, including the kafka topics, are provisioned on-demand. We will glance through how this has been implemented in Lithium.

Data pipeline pause, resume, rewind, and in-flight data remediation via sidelining

We will discuss how we can pause, resume and rewind data pipelines by issuing commands to the control plane. We will also discuss about Lithium’s Sidelining capabilities and how it helps with in-flight remediation in the pipeline.

Impact

{Replace this with an explanation of the impact of your work within your organization.}

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Together, we can build for everyone.

Workshop sponsor

Datastax, the real-time AI Company.

Lanyard Sponsor

We reimagine the way the world moves for the better.

Sponsor

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.