Migrating from a legacy to next-gen observability platform in a live environment
At eBay, we have been using different versions of the same central observability platform for ~15 years. The teams at eBay rely heavily on this platform to observe their applications and infrastructure.
This platform is so deeply ingrained into our applications and systems that it is non-trivial to replace it.
As the team managing this central platform, it is our responsibility to maintain, enhance and ensure its availability for our customers.
Over ~15 years, the open source community has made tremendous strides in the observability space. Unfortunately, our platform has not kept pace with these improvements. We have accumulated large amounts of tech debt that is very difficult to pay off. It is also very expensive (CAPEX & OPEX) to grow and maintain this platform due to its legacy tech stack.
When new applications get built at eBay, teams recognize the shortcomings of this legacy platform and have started spawning their own instances of newer/open source monitoring offerings (Prometheus, ElasticSearch, etc.). This defeats the purpose of a centralized monitoring platform that can do things like correlation across the various dependencies to reduce TTD (Time to Detect)
These reasons prompted our team to start building a modern and improved platform that will best serve eBay for its current and future needs.
This talk will be about our journey and will focus on:
* Reasons behind initial failed attempts at trying to build this new platform
* Methods we used to eventually succeed with releasing the new platform and learnings from this journey
* Complexity of switching the platform in a live production environment without any impact
* Navigating the organizational hierarchy in convincing teams at eBay to use this new platform
I am a Principal Product Manager in the Platform Engineering organization at eBay. We build platforms and services that are used by various other teams at eBay to build experiences for our customers.
In my current role, I manage the products within the Observability domain. These include products that are used to monitor the health of various layers in our infrastructure ranging from the datacenter and networks all the way to the user-facing experiences.
We are in an interesting position in this domain at eBay, where we are trying to sunset legacy products and replace them with new monitoring products and platforms. With the increased focus on DevOps in the industry, we are also seeing a lot of activity in the open-source world, which makes it a particularly exciting area to work in.
My role requires me to work closely with various teams at eBay to understand their needs, strategize with the team to decide long term vision of our products, define short-term milestones and execute to meet these milestones.