The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Ketan Khairnar

@ketankhairnar

Unless you measure it; you can’t improve it - Data pipelines for your business KPIs and KRAs

Submitted Jun 8, 2017

Abstract

Any business can gain unfair advantage through actionable insights using data pipelines and some common sense. We’re already experiencing this through our interactions online (amazon , medium.com) and through mobile apps (uber, ola and many more)

Why Data Infrastructure is important?

Important advantages of data pipelines is

  • surge in productivity - through nice CQRS interface
  • informed decision making - through trends, aggregations, leaderboards
  • Think about it as platform for A/B testing your business goals

In this workshop; you’ll be building data pipeline stack for your airbnb clone. You’ll be wearing multiple hats including.

  • Engineering or SRE team

  • Customer support team

  • Product managers

  • CXO

    and come up with quick and nimble solutions for questions at hand. You’ll also make sure to grow, change and evolve your data pipeline along with the business needs

Add to this two important interaction paradigms which can make

  • conversational patterns using chat bots - NLP and custom bot server
  • real time dashboards to track KPIs and KRAs – for different stakeholders

Note: Actual businesses are 10x complex but this is good starting point to experiment and explore these ideas.

Outline

Course Content

Key actionable insights are worth the effort of building data highway network within your company.

This workshop would introduce data pipelines as concept and help them to build one for pseudo business aka Airbnb clone. It will help them to get conversant with technology as well as thinking about data engineering. i.e. Data you generate and data you consume.

There are many vendors in this segment but impedance mismatch in your engineering practices and data and their feature set is big issue. You building your data pipelines is worth the effort as long as you stick to basics. We’ll talk about this key architectural decisions as well.

Key Takeaways:

  • Telemetry and Audit events within the application
  • Time series database and schema patterns associated with it
  • Few important patterns - Event sourcing, Polyglot persistence, CQRS
  • Transactions and Events as log
  • Chat bots using NLP as interaction pattern for customer support as well as for SRE/Engineering for remote troubleshooting
  • KPI & KRA dashboards using grafana

Requirements

  • Basic programming skills in java, javascript etc
  • Basic understanding of AWS services EC2, S3, Kinesis, Lambda

We can accommodate 40 participants to make sure we finish on time ( 3 hours )

We’ll be sharing service account keys for AWS with custom built app stack for each participant. It would need 3-4 hours to complete whole set of exercises. I’ll share costs associated with it shortly.

Speaker bio

Ketan has been working on key data pipeline projects for last few years. Building transation log for ad-tech transactions along with audit and throughput event stream helped my earlier employer ( startup ) to solve very disparate goals through same engineering infrastructure (reduce latency and optimize business ).

In his current job Ketan, Rupesh, Sumeet and rest of Ketan’s team has built complete fault monitoring solution for few thousand servers using few off the shelf open soure components. This includes home grown alert management component, bot server, custom reporting jobs. We consider all of these as applications on continuously evolving data pipeline.
This not only helps them solve production issues but helps product management to decide which features are most bang for the buck.Also helps SRE team to remotely troubleshoot data using conversational pattern with smart bot integration in slack.

Slides

https://www.slideshare.net/morbid/fifth-elephant-2017-data-pipeline-workshop

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures