arrow_back Bad Data is No Better Than No Data! - Impact of Automation in Data Stewardship Workflows in Plant Agriculture Industry
Distributed Deep Learning arrow_forward
Baking a cloud-native data warehouse from enterprise database leftovers
Submitted by Vineeti Louis (@vineetilouis) on Tuesday, 27 March 2018
dataxu® deals with collection, storage, processing, analysis, and projection of data at massive scale.
For the growing needs of interactive analysis and querying, we incorporated an MPP database as our warehouse solution. This on-premise solution served us well as the cluster scaled over the initial years. However with business growing, we ran into significant operational challenges such as constant maintenance in terms of hardware and software, unpredictable load times resulting in SLA misses. With an aim to address these pain points and deliver sustainably, we chose to leverage the cloud for it’s scaling capabilities. The bedrock of the new Reporting system is an Apache Spark driven ETL solution over S3 data lake.
In this talk, we will focus on the challenges posed by the existing system and design choices that were made in our quest for a new system.
- What we do at dataxu
- dataxu’s Reporting infrastructure
- Key challenges
- Design choices for new system
- Recipe for a cloud-native data warehouse
- Key benefits
Vineeti Louis has been working with dataxu for the last 3 years. She has worked with the Reporting Services team at dataxu and was involved in building the reporting system on enterprise data warehouse, and later moving it to the cloud.