The Fifth Elephant 2018

The seventh edition of India's best data conference

Baking a cloud-native data warehouse from enterprise database leftovers

Submitted by Vineeti Louis (@vineetilouis) on Tuesday, 27 March 2018

videocam
Preview video

Technical level

Intermediate

Section

Crisp talk

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +12

Abstract

dataxu® deals with collection, storage, processing, analysis, and projection of data at massive scale.

For the growing needs of interactive analysis and querying, we incorporated an MPP database as our warehouse solution. This on-premise solution served us well as the cluster scaled over the initial years. However with business growing, we ran into significant operational challenges such as constant maintenance in terms of hardware and software, unpredictable load times resulting in SLA misses. With an aim to address these pain points and deliver sustainably, we chose to leverage the cloud for it’s scaling capabilities. The bedrock of the new Reporting system is an Apache Spark driven ETL solution over S3 data lake.

In this talk, we will focus on the challenges posed by the existing system and design choices that were made in our quest for a new system.

Outline

  • What we do at dataxu
  • dataxu’s Reporting infrastructure
  • Key challenges
  • Design choices for new system
  • Recipe for a cloud-native data warehouse
  • Key benefits
  • Takeaways

Speaker bio

Vineeti Louis has been working with dataxu for the last 3 years. She has worked with the Reporting Services team at dataxu and was involved in building the reporting system on enterprise data warehouse, and later moving it to the cloud.
https://www.linkedin.com/in/vineeti-louis-b8275546/

Slides

https://docs.google.com/presentation/d/1a_euHDzzONwJXAkcuUFm9591lkRJOexM3jJ_SYXc16w/edit?usp=sharing

Preview video

https://photos.app.goo.gl/5TcKsc7hy4n4Lxqn1

Comments

Login with Twitter or Google to leave a comment