At Freshworks, we deal with petabytes of data everyday. For our data science teams to read online data, run ETL jobs and push out relevant predictions in quick time, it’s imperative to run a strong and efficient data pipeline. In this talk, we’ll go through the best practices in designing and architecting such pipelines.
- The role of a data engineer
- Evaluation of role
- Working with corresponding teams in detail
- Designing the data science pipeline
- Feature engineering
- Pre-processing
- R vs Python vs Scala
- Training vs Serving
- Batch vs Stream
- Leveraging streaming services (Kafka)
- Dealing with online event data
- Batch processing
- Storage
- Data-at-rest vs Working with real-time data
- Numbers
- Complete architecture walkthrough
- Scaling
- A quick view of monitoring
- Monitoring your ETL
- Health of data
- Optimising your alerts
- Webhook alert systems
Laptop
I’ve been working as a Data Engineer at Freshworks for the last three years. Prior to that, I worked for four years at three early stage startups (including Airwoot) as a backend/data engineer.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}