Challenges in Data Warehouse Augmentation on Hadoop
Submitted by koteswara Vemu (@koteswara) on Saturday, 30 April 2016
Enterprises these days are finding value in moving their traditional data warehouses into augmented and historical data stores on Hadoop. This requires continuous data synchronisation between traditional data warehouses and data on Hadoop. It is also added advantage to maintain slow changing dimensions of data when it is ingested onto Hadoop from traditional database systems. Once this data is available on the Hadoop environment, it gets integrated with other downstream systems such as analytics layer. This talk discusses the challenges in data ingestion, continuous synchronisation and providing slow changing dimensions for incremental data.
– Advantage of Data Warehousing on Hadoop – How to design a parallel and distributed data ingestion architecture – How to tune the system to achieve faster SLA – How to operate change data process at scale – Optimal data storage formats for downstream usages
koteswara is Principal Engineer in data ingestion team at Infoworks.io, an enterprise big data warehousing start up based out of the silicon valley. While working at infoworks.io he has designed a next generation enterprise module to ingest data into Hadoop, which is currently running in the premises of fortune clients of the organization. He has also spent his time as a Chief Engineer at Samsung and as a Lead Research Engineer at Siemens. He holds a masters degree in computer science from Indian Institute of Science Bangalore.