The Fifth Elephant 2018

The Fifth Elephant 2018

The seventh edition of India's best data conference

Vineeti Louis

@vineetilouis

Baking a cloud-native data warehouse from enterprise database leftovers

Submitted Mar 27, 2018

dataxu® deals with collection, storage, processing, analysis, and projection of data at massive scale.

For the growing needs of interactive analysis and querying, we incorporated an MPP database as our warehouse solution. This on-premise solution served us well as the cluster scaled over the initial years. However with business growing, we ran into significant operational challenges such as constant maintenance in terms of hardware and software, unpredictable load times resulting in SLA misses. With an aim to address these pain points and deliver sustainably, we chose to leverage the cloud for it’s scaling capabilities. The bedrock of the new Reporting system is an Apache Spark driven ETL solution over S3 data lake.

In this talk, we will focus on the challenges posed by the existing system and design choices that were made in our quest for a new system.

Outline

  • What we do at dataxu
  • dataxu’s Reporting infrastructure
  • Key challenges
  • Design choices for new system
  • Recipe for a cloud-native data warehouse
  • Key benefits
  • Takeaways

Speaker bio

Vineeti Louis has been working with dataxu for the last 3 years. She has worked with the Reporting Services team at dataxu and was involved in building the reporting system on enterprise data warehouse, and later moving it to the cloud.
https://www.linkedin.com/in/vineeti-louis-b8275546/

Slides

https://docs.google.com/presentation/d/1a_euHDzzONwJXAkcuUFm9591lkRJOexM3jJ_SYXc16w/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures