The Fifth Elephant 2024 Annual Conference (12th &13th July)
Maximising the Potential of Data — Discussions around data science, machine learning & AI
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Maximising the Potential of Data — Discussions around data science, machine learning & AI
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Niraj Mishra
There are numerous use cases that require moving large amounts of data between different systems and validating and transforming them in-flight. Platforms such as Apache Flink can be excellent choices for moving and transforming data at scale - effectively through streaming ETL. However, certain use cases within Atlassian ‘onprem to cloud data migration’, ‘cloud to cloud data migration’, ‘backup and restore’, and 'sandbox’ present unique challenges. They require stream processing pipelines to be entirely provisioned at runtime, including dedicated Kafka topics, parallelism, selection of stream processors with appropriate and available compute resources. Stream processors must be embedded directly in product and platform services to enable in-process access to service context in order to meet functional requirements along with throughput goals. In addition, our pipelines require coordination amongst sources, transforms and sinks that exist in different product and infrastructure services. This led us to build the Lithium Platform. It is 100% event driven and is built on Kafka and Kafka Streams.
Lithium is a platform that leverages Kafka Streaming technology based ETL pipelines to enable rapid and scalable data movement. It is designed for seamlessly transferring data between different points with maximum efficiency and reliability. It boasts of advanced capabilities such as decoupled ETL components, highly customizable job configuration options, in-progress remediation capabilities, continuous job monitoring, etc.
We will introduce the Lithium platform and provide an overview of why it was built in Atlassian and what kind of use cases it is solving currently. Introduce to the concept of ephemeral data pipelines, a.k.a Workplans
Introduce the Lithium control plane to the audience and discuss various components within Control Plane and what is it responsible for in managing Data Pipelines
Lithium Data Plane consists of the components responsible for moving the data through the data pipeline. We will go over various components of a Data Plane. We will also cover how we have designed Self Hosted Data Plane components, OR BYOH (Bring your own host) with the help of Lithium SDKs
We will discuss how the Control plane and Data Plane communicates with each other. What kinds to events are exchanged between these systems to keep the Data Pipelines up and running
Lithium provides simple APIs through the data plane SDKs to write custom data plane processors (extract, transform, load etc). These processors together make the Data Plane and run the custom logic within the Data Pipelines.
Lithium Data plane implements an Auctioning model to provision compute resources required for a particular job and all the resources, including the kafka topics, are provisioned on-demand. We will glance through how this has been implemented in Lithium.
We will discuss how we can pause, resume and rewind data pipelines by issuing commands to the control plane. We will also discuss about Lithium’s Sidelining capabilities and how it helps with in-flight remediation in the pipeline.
{Replace this with an explanation of the impact of your work within your organization.}
Jul 2024
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri
13 Sat 09:00 AM – 06:05 PM IST
14 Sun
Hosted by
Supported by
Gold Sponsor
Sponsor
Community Partner
Beverage Partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}