Don’t just build a data lake, build data powerhouse.

Jul 2016

25 Mon

26 Tue

27 Wed

28 Thu 08:30 AM – 06:25 PM IST

29 Fri 08:30 AM – 06:15 PM IST

30 Sat 08:45 AM – 05:00 PM IST

31 Sun 08:15 AM – 06:00 PM IST

NIMHANS Convention Centre

Don’t just build a data lake, build data powerhouse.

Submitted Jun 13, 2016

Section: Full talk Technical level: Intermediate

Companies are now trying to become data oriented and trying to take decision based on data.

First step in moving towards data oriented decision is to collect data. Data Lake has become one of the recent buzz word in Big Data industry. Most of the time companies try to first build a Data Lake which will contain all their data. Most often dumping data into data lake translate into exporting all the data from various RDBMS databases [e.g Orders, Inventory], scraping all the log’s into their data lake. Once we have all the relevant data in data lake, we write various processing applications to extract data out of the source data. Above approach has many problems [e.g. huge upfront cost, missing information not currently tracked e.t.c] associated with it.

In this talk I will be proposing another approach for data driven system where instead of dumping all the data into central location, we identify the events/interactions/facts [ e.g Add to card event, Viewing a product e.t.c] in the company and store them for processing. I will be explaining how this approach becomes much more result oriented and much more agile than the dumping approach.

Outline

Data Lake, the traditional way :
- Explains some current architecture to build data lake.
- Problems associated with the approach.
- Real Life Example.
What is events/interactions/facts?
- Explaining terminology.
- Reason to track them.
- Defining
  - Business event
  - Developer events
  - Monitoring events
Use Case Driven Development:
Proposed Architecture:
Benefits of proposed Architecture.
- Business Stakeholder
- Developer
- Monitoring.

Speaker bio

Akash Mishra is currently working as a Data Engineer at Badoo Trading Limited with more than 4 years experience in building large scale big data application for various client of ThoughtWorks Technologies. He has production experience with various big data technologies like Spark,Hadoop, Mesos e.t.c. He is passionate developer and has deep interest in Distributed Systems. He has co-organised Big Data Meetup for Pune & NCR. He has already given various talks in meetups and Geek Night & contributed to Apache Spark project.

The Fifth Elephant 2016