How to make a kickass data platform with spark and S3

Submitted Jul 1, 2019

Session type: Full talk of 40 mins

In this talk, we will explore the advantages and challenges faced while running an in-house data platform using spark and S3. We will also discuss how to add some essential features to your platform like autoscaling and access control. The latter part of the talk will also address some ways to organise data in S3, storage formats for big data and indexing to improve read performance for big-data use cases. Overall the intention of this talk is to share the problems we faced while scaling our data platform and some of the solutions that worked for us.

Outline

Introduction to spark and S3
Essential features of a data platform
Autoscaling
Access Control
Storage formats for big data
Improving read performance of data in S3

Speaker bio

I have been working on big-data pipelines for the past 5 years, first at my startup, retention.ai , then later at inShorts. Currently working as backend engineer at Zendrive

All submissions

Previous Next

Comments

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures

The Fifth Elephant round the year submissions for 2019