Rootconf Mini 2024 (on 22nd & 23rd Nov)

Geeking out on systems and security since 2012

Snehasish Roy

@snehasishroy

Zero Downtime, Zero Compromise: How PhonePe's DocStore Handles Billions of Documents

Submitted Oct 28, 2024

Overview

Ever wondered what happens when millions of PhonePe users share documents, buy insurance, or upload KYC information? Enter DocStore - the powerhouse behind PhonePe’s massive document operations. This home-grown object storage platform seamlessly handles thousands of critical transactions, from instant chat attachments to vital insurance documents, powering both customer experiences and developer platforms.

Picture this: Every time you send a photo in PhonePe chat, submit documents for insurance, or interact with our developer portals, you’re tapping into DocStore’s capabilities. It’s not just a storage system - it’s the digital vault that safeguards and serves documents for India’s leading fintech platform, processing terabytes of data while ensuring bank-grade security and lightning-fast accessibility.

Unlike most fintech players who rely on public clouds, PhonePe took the road less traveled - building everything on our private cloud infrastructure. But with great control comes greater responsibility: how do you ensure zero downtime when handling billions of critical documents across multiple data centers?

Our answer came in the form of an Active-Passive architecture. Through clever engineering with GlusterFS Geo Replication and our custom-built ElasticSearch replication plugin, we’ve created a system that stays resilient even when entire data centers go dark. Join us as we unveil the challenges we tackled, the solutions we crafted, and the lessons we learned while building this.

Agenda

  • Design and architecture of providing Storage at Petabyte scale.
  • Tech stack : GlusterFS, ElasticSearch, RabbitMQ, Aerospike.
  • Challenges faced in GlusterFS Geo Replication.
  • Challenges Faced in Enabling Replication in ElasticSearch Cluster

Takeaways

  • Build vs Buy
  • Do’s/Dont’s for managing infrastructure at scale.
  • Critical considerations for implementing fault tolerance
  • Practical insights for large-scale storage systems

Audience

  • Site Reliability and DevOps Engineers
  • Engineering leaders
  • Cloud architects and engineers
  • Teams building large-scale storage solutions

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy

Supported by

Platinum Sponsor

Nutanix is a global leader in cloud software, offering organizations a single platform for running apps and data across clouds.

Platinum Sponsor

PhonePe was founded in December 2015 and has emerged as India’s largest payments app, enabling digital inclusion for consumers and merchants alike.

Silver Sponsor

The next-gen analytics engine for heavy workloads.

Sponsor

Community sponsor

Peak XV Partners (formerly Sequoia Capital India & SEA) is a leading venture capital firm investing across India, Southeast Asia and beyond.

Venue host - Rootconf workshops

Thoughtworks is a pioneering global technology consultancy, leading the charge in custom software development and technology innovation.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Community Partner

A community of Rust language contributors and end-users from Bangalore. We have presence on the following telegram channels https://t.me/RustIndia https://t.me/fpncr LinkedIn: https://www.linkedin.com/company/rust-india/ Twitter (not updated frequently): https://twitter.com/rustlangin more