Tickets

Loading…

Anant Shrivastava

Anant Shrivastava

@anantshri Reviewer

Aditya Patawari

Aditya Patawari

@adityapatawari Reviewer

Chandrapal Badshah

Chandrapal Badshah

@badshah Reviewer

Sayan Chowdhury

Sayan Chowdhury

@yudocaa Reviewer

Review and feedback for Drone - container orchestrator built at PhonePe

Submitted Jan 19, 2025

Details

Review date and time - 17 January 2025, 6 PM - 7 PM
Presenters - Santanu Sinha (Chief Architect at PhonePe); Vishnu Naini (SRE at PhonePe)


About Drove

Drove is a distributed container orchestrator built by PhonePe to address the challenges the team faced when Mesos reached end of life, including outdated security, lack of community support, and the need for numerous plugins.

🔗 Link to the slides shown at the review - https://images.hasgeek.com/embed/file/97396a27bd0b4d29815d59f6a2bf651b

Main challenges

  • Outdated security model of Mesos
  • Lack of community support for Mesos
  • Mesos not originally designed for container-centric architecture
  • Need for numerous plugins to add containerization support to Mesos
  • Inefficient resource utilization and traffic routing
  • Disruptions during maintenance
  • Lack of portability across operating systems and container engines

How Drove is built

  • Drove uses existing platforms like Docker and Podman
  • Dropwizard is a framework for building REST APIs. Drove’s API leverages DropWizard
  • Leverages NGINX or HAProxy for traffic routing
  • CoreDNS plugin as an alternative for traffic routing
  • Additional plugins like Epoch for time-based task scheduling
  • Supports three types of workloads: applications (with instances/containers), tasks (transient jobs), and local services (still in development)
  • Provides CoreDNS and Drove Gateway for traffic routing

Technical architecture

Controller: Orchestrates executors, schedules and allocates resources, tracks workload lifecycles, exposes API and UI, streams logs, supports events, and uses Zookeeper for coordination and leader election.
Executor: Actual execution host where containers run.
Drove Gateway: Interfaces with NGINX for traffic routing (alternative to CoreDNS plugin).
Zookeeper: Used for coordination and leader election.


Aditya Patawari’s feedback

  • The presentation lacked a clear narrative and a smooth flow.
  • The motivation for building Drove was not clear, especially considering other options like Rancher and Kubernetes.
  • The ecosystem around Drove and ease of writing specifications (JSON) were questioned.
  • The concept of local tasks and their potential use cases were discussed.

Aditya commended Drove, but suggested improvements to the presentation. He recommended a clearer narrative with a smooth flow, and a diagram showcasing Drove’s role in managing PhonePe’s infrastructure. He emphasized the need to explain the motivation for building Drove, especially considering the availability of other orchestration systems. He also questioned the ecosystem around Drove, the ease of writing JSON specifications, and the concept of local tasks.
Aditya pointed out that the transitions between talking about Drove software and the PhonePe infrastructure were abrupt. He suggested using a diagram at the beginning to illustrate the current setup and how Drove manages it, which would pique the audience’s interest and make the presentation flow better.

Anant Shrivastava’s feedback

  • The motivation for the presentation and the target audience was not clear.
  • Comparisons with other orchestration systems and benchmarking data are missing, making it hard to evaluate Drove’s benefits. (example points such as performance improvement coz of fixed allocation of CPU or NUMA were noted but no clear comparision whether they were only available with drove or they were available with others and yet drove was better then them)
  • The security aspect was mentioned but not elaborated on.
  • The diagrams were confusing (as all parts were appearing in one shot). They should be placed before the demo and should be explained and revealed gradually on the slides.
  • Too many bullet points in the slides and lack of images made the presentation hard to follow.
  • The claim of “proprietary” elements was questioned. (if there is no selling angle suggestion is to use internal as the keyword)
  • The lack of information about the documentation and contribution guidelines was highlighted.
  • The core unique selling points (USPs) of Drove were not emphasized enough.
  • The demo should focus on Drove’s USPs, such as simplicity and fast cluster upgrades.
  • The motivation for building Drove, including the challenges with existing tools and the reasons for not adopting other solutions, should be clearly stated at the beginning.
  • The slides must have slide numbers on them
  • The design philosophy should be clearified in the beginning of presentation.

Anant’s feedback was that the presentation lacked a clear motivation and target audience, making it difficult for the audience to understand the purpose and relevance of the content. He suggested that the presenters compare Drove with other orchestration systems and provide benchmarking data to showcase its benefits, particularly highlighting whether features like performance improvements through fixed CPU allocation or NUMA are unique to Drove or better implemented compared to alternatives. He also noted the lack of security details, suggesting that this aspect should be elaborated to demonstrate Drove’s robustness. The diagrams were described as confusing because all elements appeared at once; Anant recommended revealing and explaining them gradually on the slides, preferably before the demo. Furthermore, he pointed out the excessive use of bullet points and the lack of visuals, which made the presentation harder to follow. He questioned the claim of “proprietary” elements, suggesting that “internal” might be a better term if there is no clear selling angle. The absence of documentation and contribution guidelines was also highlighted as a significant gap. Anant emphasized that the core unique selling points (USPs) of Drove, such as simplicity and fast cluster upgrades, were not sufficiently emphasized and should be the primary focus of the demo. He further recommended clearly stating the motivation for building Drove, including the challenges with existing tools and reasons for not adopting alternatives, at the beginning of the presentation. Additional suggestions included adding slide numbers for easier navigation and clarifying the design philosophy early in the presentation to establish a cohesive narrative.

Specific suggestions regarding the presentation’s structure and content:

  • Anant suggested that the diagrams should be placed before the demo, and that the slides should have fewer bullet points and more images and must contain a slide number.
  • He also said that the core unique selling points (USPs) of Drove were not emphasized enough, and that the demo should focus on those USPs.
  • Additionally, he suggested clearly stating the motivation for building Drove and design philosophy at the beginning of the presentation.

Chandrapal Badshah had two main concerns

Secrets management: He questioned the claim that secrets are not accessible, as he saw an auth token in the logs. He suggested separating PhonePe’s internal architecture from the open-source parts, especially regarding security.
Control Plane: Chandrapal found it difficult to visualize Drove’s control plane and data plane, and suggested explaining it more clearly using examples like Google Nest.

Santanu and Vishnu responded that:

  1. Drove currently supports only basic auth, and they plan to add support for other auth providers in the open-source version.
  2. They use RBAC internally and will consider adding it to the open-source version.
  3. The complexity and potential vulnerabilities of implementing various auth methods are a concern.
  4. They acknowledged the feedback and agreed to add a slide explaining the basic auth limitation.

Sayan Chowdhury’s feedback

  • Sayan found the initial slides confusing and suggested adding more context and visual aids.
  • He recommended reducing the amount of text on the slides and using images to make the presentation more engaging.
  • He also suggested moving the QR code to the question slide and adding a license to the repositories.
  • Finally, he proposed creating a few issues on the repository to encourage audience participation and contributions.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

We care about site reliability, cloud costs, security and data privacy