Rootconf Hyderabad edition

Rootconf Hyderabad edition

On SRE, systems engineering and distributed systems

Make a submission

Accepting submissions till 30 Sep 2019, 11:59 PM

T-Hub, Hyderabad

About Rootconf Hyderabad:

Rootconf Hyderabad is a platform for:

  1. DevOps engineers
  2. Site Reliability Engineers (SRE)
  3. ML and data engineers
  4. Security and DevSecOps professionals
  5. Software engineers

to discuss real-world problems around:

  1. Site Reliability Engineering (SRE)
  2. Data and AI engineering
  3. Distributed systems – observerability, microservices
  4. Implementing Infrastructure as Code

Speakers from Flipkart, Hotstar, Intuit, GO-JEK, MadStreetDen and Trusting Social will share their experiences with the above challenges.

Event venue:

Rootconf Hyderabad will be held at T-Hub, IIIT-Hyderabad Campus, Gachibowli, Hyderabad, Telangana - 500032

Contact information:

For bulk ticket purchases,sponsorship and other inquiries, contact sales@hasgeek.com or call 7676332020

Sponsors:

Click here to view the Sponsorship Deck.


Rootconf Hyderabad 2019 sponsors:


Platinum Sponsor

Atlassian

Bronze Sponsors

upcloud Elastic Hashicorp

For information about the event, tickets (bulk discounts automatically apply on 5+ and 10+ tickets) and speaking, call Rootconf on 7676332020 or write to info@hasgeek.com.

Hosted by

Rootconf is a forum for discussions about DevOps, infrastructure management, IT operations, systems engineering, SRE and security (from infrastructure defence perspective). more

Ankur Jain

@anvinjain

Fast object distribution using P2P

Submitted May 11, 2019

In Flipkart, objects like VM/container images, media, ML models, etc are stored in a homegrown object storage service. To improve download latencies and avoid getting rate limited, teams normally introduce nginx or some other cache to access frequently used objects. This seemingly works for small objects, less dynamic content or when only few clients are downloading concurrently. But for large objects which are required to be downloaded concurrently by hundreds of clients, network bandwidth of cache cluster gets choked, thereby increasing download latency. Cache misses further worsen the total download duration.

Shatabdi solves the above problem by transparently converting any given http resource to a torrent and downloading it on all participating clients in peer-to-peer(p2p) fashion. It can also be extended to support non-http sources of truth as well. Shatabdi daemon abstracts out all the noise and provides a REST API hosted locally to manage downloads/uploads.

Presently, Shatabdi has gone live in Flipkart datacenters and used for distributing VM and container images to motherships. We foresee this platform to be used for a myriad of other scenarios, such as:

  • Warming up nodes in caching cluster on startup

  • Distributing pre-computed datasets to multiple instances of a datastore

  • Distributing ML models that are required by multiple nodes

Some numbers for the interested, these involve no cache apart from a single download from a http source of truth(object storage system in Flipkart):

  • Disk-based download
    512 clients downloading 3.2GB concurrently in ~(2-4 mins) with upload speed capped to 24 mbps

  • TmpFS-based download
    50 clients downloading 3.2GB concurrently in ~15-25seconds with warmed up seeder

This talk will focus on the motivation behind building Shatabdi and a technical deep dive in its implementation. Broadly, we will cover performance tuning of open-source torrent clients, functional flows to ensure maximal bandwidth consumption and always healthy magnets and benchmarking large deployments container based clusters in Flipkart.

For any usecase involving object distribution at scale for large object size and a lot of concurrent clients, this talk should give valuable insights around architecting solutions using p2p.

Outline

  • Motivation for fast object distribution
  • Goals for a torrent-based distribution layer
  • Exploring open-source alternatives
  • Design, architecture and functional flows of Shatabdi
    • Indexing store and leader election using etcd
    • Achieving HA for torrent tracker
  • Performance tuning of libtorrent
  • Benchmarking container deployments powered by Shatabdi
  • Adoption @ Flipkart

Speaker bio

Ankur works in Platform-as-a-service team in Flipkart Cloud Platform where he builds dev-centric platforms that work at Flipkart scale.

Links

Slides

https://docs.google.com/presentation/d/144puJStK8ZVvd43XRRP5Jxi9TBGxzdg0xx1CTmmmGyw/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 30 Sep 2019, 11:59 PM

T-Hub, Hyderabad

Hosted by

Rootconf is a forum for discussions about DevOps, infrastructure management, IT operations, systems engineering, SRE and security (from infrastructure defence perspective). more