arrow_back Lessons Learnt building and optimizing a self service Data Platform on Apache Spark at Indix
Unless you measure it; you can’t improve it - Data pipelines for your business KPIs and KRAs
Submitted by Ketan Khairnar (@ketankhairnar) on Thursday, 8 June 2017
Any business can gain unfair advantage through actionable insights using data pipelines and some common sense. We’re already experiencing this through our interactions online (amazon , medium.com) and through mobile apps (uber, ola and many more)
Important advantages of data pipelines is
- surge in productivity - through nice CQRS interface
- informed decision making - through trends, aggregations, leaderboards
- Think about it as platform for A/B testing your business goals
In this workshop; you’ll be building data pipeline stack for your airbnb clone. You’ll be wearing multiple hats including.
- Engineering or SRE team
- Customer support team
- Product managers
and come up with quick and nimble solutions for questions at hand. You’ll also make sure to grow, change and evolve your data pipeline along with the business needs Add to this two important interaction paradigms which can make
- conversational patterns using chat bots - NLP and custom bot server
- real time dashboards to track KPIs and KRAs – for different stakeholders
Note: Actual businesses are 10x complex but this is good starting point to experiment and explore these ideas.
Key actionable insights are worth the effort of building data highway network within your company.
This workshop would introduce data pipelines as concept and help them to build one for pseudo business aka Airbnb clone. It will help them to get conversant with technology as well as thinking about data engineering. i.e. Data you generate and data you consume.
There are many vendors in this segment but impedance mismatch in your engineering practices and data and their feature set is big issue. You building your data pipelines is worth the effort as long as you stick to basics. We’ll talk about this key architectural decisions as well.
- Telemetry and Audit events within the application
- Time series database and schema patterns associated with it
- Few important patterns - Event sourcing, Polyglot persistence, CQRS
- Transactions and Events as log
- Chat bots using NLP as interaction pattern for customer support as well as for SRE/Engineering for remote troubleshooting
- KPI & KRA dashboards using grafana
- Basic understanding of AWS services EC2, S3, Kinesis, Lambda
We can accommodate 40 participants to make sure we finish on time ( 3 hours )
We’ll be sharing service account keys for AWS with custom built app stack for each participant. It would need 3-4 hours to complete whole set of exercises. I’ll share costs associated with it shortly.
Ketan has been working on key data pipeline projects for last few years. Building transation log for ad-tech transactions along with audit and throughput event stream helped my earlier employer ( startup ) to solve very disparate goals through same engineering infrastructure (reduce latency and optimize business ).
In his current job Ketan, Rupesh, Sumeet and rest of Ketan’s team has built complete fault monitoring solution for few thousand servers using few off the shelf open soure components. This includes home grown alert management component, bot server, custom reporting jobs. We consider all of these as applications on continuously evolving data pipeline.
This not only helps them solve production issues but helps product management to decide which features are most bang for the buck.Also helps SRE team to remotely troubleshoot data using conversational pattern with smart bot integration in slack.