Rootconf Sessions

Rootconf Sessions

Presentations on DevOps and Dev-Oops

Make a submission

Accepting submissions till 31 Dec 2020, 11:59 PM

DevOps and Dev-Oops sessions cover:

  1. DevOps (obviously!)
  2. Databases and database disasters
  3. Distributed systems
  4. Systems engineering
  5. Infrastructure security
  6. War stories, failures and anti-patterns with running operations and infrastructure

Rootconf is a forum for:

  1. DevOps engineers
  2. Systems engineers
  3. Software engineers
  4. DevOps leads
  5. Infrastructure security professionals and experts
  6. Cloud service providers
  7. Providers of the pieces on which an organization’s IT infrastructure runs -- monitoring, log management, alerting, etc
  8. Engineering managers looking to optimize infrastructure and teams

Contact information: If you have questions, call Rootconf on 7676332020 or email rootconf.editorial@hasgeek.com

Hosted by

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more

Amber

Migrating from a legacy to next-gen observability platform in a live environment

Submitted Feb 12, 2020

At eBay, we have been using different versions of the same central observability platform for ~15 years. The teams at eBay rely heavily on this platform to observe their applications and infrastructure.
This platform is so deeply ingrained into our applications and systems that it is non-trivial to replace it.
As the team managing this central platform, it is our responsibility to maintain, enhance and ensure its availability for our customers.
Over ~15 years, the open source community has made tremendous strides in the observability space. Unfortunately, our platform has not kept pace with these improvements. We have accumulated large amounts of tech debt that is very difficult to pay off. It is also very expensive (CAPEX & OPEX) to grow and maintain this platform due to its legacy tech stack.
When new applications get built at eBay, teams recognize the shortcomings of this legacy platform and have started spawning their own instances of newer/open source monitoring offerings (Prometheus, ElasticSearch, etc.). This defeats the purpose of a centralized monitoring platform that can do things like correlation across the various dependencies to reduce TTD (Time to Detect)
These reasons prompted our team to start building a modern and improved platform that will best serve eBay for its current and future needs.

Outline

This talk will be about our journey and will focus on:

  • Reasons behind initial failed attempts at trying to build this new platform
  • Methods we used to eventually succeed with releasing the new platform and learnings from this journey
  • Complexity of switching the platform in a live production environment without any impact
  • Navigating the organizational hierarchy in convincing teams at eBay to use this new platform

Speaker bio

I am a Principal Product Manager in the Platform Engineering organization at eBay. We build platforms and services that are used by various other teams at eBay to build experiences for our customers.

In my current role, I manage the products within the Observability domain. These include products that are used to monitor the health of various layers in our infrastructure ranging from the datacenter and networks all the way to the user-facing experiences.
We are in an interesting position in this domain at eBay, where we are trying to sunset legacy products and replace them with new monitoring products and platforms. With the increased focus on DevOps in the industry, we are also seeing a lot of activity in the open-source world, which makes it a particularly exciting area to work in.

My role requires me to work closely with various teams at eBay to understand their needs, strategize with the team to decide long term vision of our products, define short-term milestones and execute to meet these milestones.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Make a submission

Accepting submissions till 31 Dec 2020, 11:59 PM

Hosted by

Rootconf is a community-funded platform for activities and discussions on the following topics: Site Reliability Engineering (SRE). Infrastructure costs, including Cloud Costs - and optimization. Security - including Cloud Security. more