The Fifth Elephant round the year submissions for 2019

Submit a talk on data, data science, analytics, business intelligence, data engineering and ML engineering

Propose a session

How to make a kickass data platform with spark and S3

Submitted by Anshul Singhle on Monday, 1 July 2019

Session type: Full talk of 40 mins

Abstract

In this talk, we will explore the advantages and challenges faced while running an in-house data platform using spark and S3. We will also discuss how to add some essential features to your platform like autoscaling and access control. The latter part of the talk will also address some ways to organise data in S3, storage formats for big data and indexing to improve read performance for big-data use cases. Overall the intention of this talk is to share the problems we faced while scaling our data platform and some of the solutions that worked for us.

Outline

  • Introduction to spark and S3
  • Essential features of a data platform
  • Autoscaling
  • Access Control
  • Storage formats for big data
  • Improving read performance of data in S3

Speaker bio

I have been working on big-data pipelines for the past 5 years, first at my startup, retention.ai , then later at inShorts. Currently working as backend engineer at Zendrive

Comments

  • Venkata Pingali (@pingali) 5 months ago

    Hi! Anshul,

    It is a common pattern to use spark with S3 backend. Glad you
    are talking about how to approach it systematically.

    What will help your proposal is slides and details. Would love to
    see the practical issues faced along with any quantification and
    approaches.

    -Venkata

  • Abhishek Balaji (@booleanbalaji) Reviewer 5 months ago

    Anshul, without slides or more information, we cannnot proceed with the proposal. Do add details ASAP.

  • Anshul Singhle Proposer 5 months ago

    i’m working on the slides, will add them soon, thanks

    • Abhishek Balaji (@booleanbalaji) Reviewer 5 months ago

      Hi Anshul,

      We’re likely to consider this topic for a Birds of a Feather session since it might be tight to review your proposal and get the changes incorporated. We recognize that this is an important topic to be discussed and hence are scheduling a Birds of a Feather session. These are off the record discussions, facilitated by a few folks from the community, with interested folks from the audience participating. Due to the off the record nature, participants enjoy hvaing a free flowing discussion. Do let me know if you’d be interested in joining this session and I’ll send you more details over email.

  • Anshul Singhle Proposer 4 months ago

    Sure, we can make it a BOF session. Are slides required for the same?

Login with Twitter or Google to leave a comment