Designing a Data Pipeline at Scale

Jul 2019

22 Mon

23 Tue

24 Wed

25 Thu 09:15 AM – 05:45 PM IST

26 Fri 09:20 AM – 05:30 PM IST

27 Sat

28 Sun

NIMHANS Convention Centre, Bengaluru

Tickets

All submissions

Previous Next

Designing a Data Pipeline at Scale

Submitted Apr 15, 2019

Session type: Full talk of 40 mins

At Freshworks, we deal with petabytes of data everyday. For our data science teams to read online data, run ETL jobs and push out relevant predictions in quick time, it’s imperative to run a strong and efficient data pipeline. In this talk, we’ll go through the best practices in designing and architecting such pipelines.

Outline

The role of a data engineer

Evaluation of role
Working with corresponding teams in detail

Architecture

Designing the data science pipeline
Feature engineering
Pre-processing
R vs Python vs Scala
Training vs Serving

Scale by design

Batch vs Stream
Leveraging streaming services (Kafka)
Dealing with online event data
Batch processing
Storage
Data-at-rest vs Working with real-time data

Building for Freshworks

Numbers
Complete architecture walkthrough
Scaling

A quick view of monitoring

Monitoring your ETL
Health of data
Optimising your alerts
Webhook alert systems

Requirements

Laptop

Speaker bio

I’ve been working as a Data Engineer at Freshworks for the last three years. Prior to that, I worked for four years at three early stage startups (including Airwoot) as a backend/data engineer.

All submissions

Previous Next

Comments

Jul 2019

22 Mon

23 Tue

24 Wed

25 Thu 09:15 AM – 05:45 PM IST

26 Fri 09:20 AM – 05:30 PM IST

27 Sat

28 Sun

Hybrid access (members only)

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

The Fifth Elephant 2019

Designing a Data Pipeline at Scale

Outline

Requirements

Speaker bio

Comments