Tickets

Loading…

Adarsh Mysore Thimmappa

Jira cloud data extraction @ scale

Submitted Jun 3, 2024

Cloud data extraction is a subset of the broader data engineering field that involves the process of retrieving or pulling data from cloud-based applications and services for analysis, reporting, or storage in a centralized data repository.
Atlassian’s data extraction solution has evolved significantly over the years to meet the demands of enterprise-grade customers. Initially started with full tenant database copying, the process has transitioned to batch extracting data over RESTful APIs via dedicated data extraction services and now utilizes Atlassian’s own streaming solution, the Lithium platform. This evolution allowed for efficient extraction, transformation, and ingestion of data at scale. Two types of data extraction are common: Full Data Extract (FDE) and Partial/Selective Data Extract (PDE). FDE involves copying the entire tenant database, while PDE offers flexibility by allowing the selection of specific data during extraction. This evolution in cloud data extraction architecture has ensured a robust and efficient solution for enterprise-grade customers who has to deal with a lot of data.

A couple of enterprise grade use cases to highlight the purpose of scalable data extraction solution include

  • Move data from one instance of the cloud to another cloud
  • Backup data from one cloud instance to restore data into same or another cloud instance on-demand basis.

Outline

  • The early days of data extraction
  • Evolution of cloud data extraction approach over the last couple of years
  • Challenges associated with the evolution of data extraction
  • Path to achieve cloud data extraction at scale

Impact

Scalable data extraction approach resulted in handling the large data set without compromising performance, optimizing resource utilization among other benefits.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

All about data science and machine learning

Supported by

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor