Rootconf 2017

On service reliability

Mesos, KVM and the story of Infrastructure at PhonePe

Submitted by Burzin Engineer (@burzinengineer) on Jan 24, 2017

Section: Full talk of 40 mins duration Technical level: Advanced Status: Rejected


PhonePe is a mobile payment solution based on NPCI UPI. Phonepe infrastructure runs on a combination of docker containers, VMs and baremetals. This talk focuses on an internal cloud solution which helps manage the various components using mesos.


Brief Introduction to PhonePe Infrastructure 5m

  • Layout of applications on Mesos with Marathon Framework exclusively on containers, exlusively implementing business logic, written in java or nodejs.
  • Applications on VMs (Egs: loadbalancers )
  • Applications on Baremetal servers (Egs: Databases)
  • Core Infrastructure components (routers,firewalls,tunnels,dns,dhcp etc)

The Problem 5m

  • Devops needs to manage an environment that consists of containers, virtual machines and Baremetals seamlessly.
  • Need to manage resources like CPU, Memory, private IPs
  • Constraints like tenancy, apps on SSD, PCI etc.
  • Multiple Operating Systems/Versions

Mesos Frameworks 12m

The big picture 12m

  • PhonePe Cloud Implementation
  • Multiple Environments(staging,integration)
  • Multiple DataCenters
  • How does everything i.e., the application mesos cluster, the virtual machine cloud and the baremetals tie into a single seamless infrastructure
  • Short update on integration of DNS

Q & A 5m



Speaker bio

Burzin Engineer is the cofounder and Chief Reliability Officer at PhonePe.


Preview video


  • Philip Paeps (@trouble) 3 years ago

    In order to evaluate your proposal for a full 40 minute slot, the editorial panel will need a more detailed outline than this, ideally broken down with a rough indication of timing.

    It would be helpful if you could specify what specific problems you encountered in your environment and how the tools you will talk about solved your problem. For an example of the kind of outline we are looking for, take a look at this proposal:

    • Zainab Bawa (@zainabbawa) Crew 3 years ago

      To add to Philip’s point, why is PhonePe’s use case interesting to anyone in the audience who is not from payments domain?

  • Krishnan V.R. (@krishnanvr) 3 years ago

    Updated. Shall continue to refine it as the slides get built. Hope this works!

    • Philip Paeps (@trouble) 3 years ago

      Thanks very much. This is a lot better!

      It would be good if you could add a little bit more depth to your “big picture” since that’s probably where the main focus of your presentation should be. Spending a little bit of time on Mesos frameworks is okay, but I don’t feel it needs to be like a “howto” or a “worked example”. A very high level overview of the moving parts probably suffices. That’ll give you more time for your big picture.

      I’m particularly interested in why you pick out DNS specifically but I admit to having an unhealthy relationship with DNS. I’d also like to know a bit more about how multiple data centres fit in.

    • Zainab Bawa (@zainabbawa) Crew 3 years ago (edited 3 years ago)

      Couple of questions:
      1. Adding to the point that Trouble made about the conclusion. What is that other payment companies can learn from your DevOps story? What is it that audiences from verticals other than payments learn from this story? In other words, what is in it for me, as an audience member, to attend this talk? What am I going to learn? Please articulate your insights more clearly.
      2. The speaker mentioned in the bio is Burzin whereas you are Krishnan. Is this some case of pseudo anonymity? Who is the ‘real’ speaker? :)
      3. Please upload a two-minute self-recorded video where the speaker explains what this talk is about, and why should an audience member attend it. We require this to assess the proposer’s speaking skills before we make a final decision.

      • Krishnan V.R. (@krishnanvr) 3 years ago

        1.) Talk is not specific to payment companies, its delves into how the infrastructure was adapted to accomodate a host of new technologies, while having to work ground up, as opposed to hosting directly on cloud, without compromising on the flexibility sought by the application architecture in order to drive business needs. I believe this talk would be interesting for someone who is considering to setup their own infrastructure in a colocation for any reason.
        2.) Burzin and I work together as a team and he does the talk. Hope that is sorted out.
        3.) Done.

        • Zainab Bawa (@zainabbawa) Crew 3 years ago (edited 3 years ago)

          The proposed talk needs more focus than telling the audience why Mesos. The interesting question is why Mesos for PhonePe and not something else.

          The conference editors suggest that the talk takes a deep dive into one of these questions:
          1. Why pick Mesos over Kubernettes?
          2. What is your failover and backup strategy given that PhonePe has to operate in PCI compliant environment?
          3. Given that PhonePe operates in multiple data centres, how do you manage low latency and partitioning and standard challenges?

          The proposal / draft slides need to address one of these questions to make the talk interesting for the audience. Otherwise, approaching this as a Mesos talk is generic and will not be of interest to attendees.

      • Krishnan V.R. (@krishnanvr) 3 years ago

        hello, how can/where can i send across the video ?

        • Zainab Bawa (@zainabbawa) Crew 3 years ago

          Please send the link to your recorded video to with the subject header as Rootconf 2017 and the title of your proposal.

  • Burzin Engineer (@burzinengineer) Proposer 3 years ago

    The talk will mostly address the following

    • phonepe cloud KVM plugin with tight integration into mesos
    • we will not be discussing failover at this point, maybe a topic for next year?
    • we have private P2P lines between DC’s and also a IPSEC backup link over internet

    • the question of kubernetes v/s mesos is an interesting one.

    We chose mesos because it has proven scale for 10’s of thousand of servers. kubernetes is a cluster manager for containers (only?) while mesos is a distributed system kernel that will make your cluster look like one giant computer system to all supported frameworks and apps that are build to be run on mesos. Yet kubernetes is one (amongst others) framework that can be run on mesos.
    As far as I know its not easy or possible to build your own frameworks on top of kubernetes ( not 100% on this).

    Mesos abstracts underlying hardware (e.g. bare metal or VMs) away and just exposes the resources. It contains primitives for writing distributed applications (e.g. Spark was originally a Mesos App, Chronos, etc.) such as Message Passing, Task Execution, etc. Thus, entirely new applications are made possible. Apache Spark is one example for a new (in Mesos jargon called) framework that was built originally for Mesos. This enabled really fast development - the developers of Spark didn’t have to worry about networking to distribute tasks amongst nodes as this is a core primitive in Mesos.

    To my knowledge, Kubernetes is not used inside Google in production deployments today. For production, Google uses Omega/Borg, which is much more similar to the Mesos/Marathon model.

    They are both good and depends on your comfort level and skill. We started small, didn’t know how big we would grow or how fast, so we went with mesos, there is no right or wrong here

  • Anna John (@annajohnsn007) a year ago

    “PhonePe is an onlinepayment system which is a new process in electronic funds transfer. It is now used by loads of peoples in India. It revolutionizing payments by building India’s Payments App to provide a cashless and seamless payment experience.”

Login to leave a comment