The Fifth Elephant 2015
A conference on data, machine learning, and distributed and parallel computing
Jul 2015
13 Mon
14 Tue
15 Wed
16 Thu 08:30 AM – 06:35 PM IST
17 Fri 08:30 AM – 06:30 PM IST
18 Sat 09:00 AM – 06:30 PM IST
19 Sun
The talk aims to provide data builders key aspects that will help them build their own frameworks and tools to add some transparency to their data pipeline and ship faster.
Most organizations leveraging data do so on technologies such as Hadoop, Spark or Vertica. All these allow organizations to process data but nearly always these organizations maintain code base / frameworks etc which the builders use to clean, process and query this data. While building Starscream (Shopify’s dimensional modelling framework on top Spark), we learnt various lessons about numerous building blocks that don’t come as part of these technologies yet are critical for smooth functioning and transparency of our data pipeline. The talk aims to provide the audience with these building blocks such as metadata, incremental builds etc, their use case and how they helped Shopify ship faster.
Basic experience with processing data
Yagnik is a software developer at Shopify.
linkedin.com/in/yagnik
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}