Online ML model performance benchmarking at Linkedin Scale : Implementation & Applications

Submitted Jun 30, 2023

Abstract

At LinkedIn, we serve 100000s of inferences per second across 100s of ML models concurrently in our online systems. ML models have different system performance characteristics - ranging from lightweight XGBoosts to memory intensive recommendation models, to the newer Generative AI models, which are both compute and memory intensive. We run these models across different hardware profiles - across different CPU and GPU SKUs. Taking these into account, we have built a performance benchmarking system for ML models at LinkedIn based on the MLPerf Inference Benchmark paper. This system plays a crucial role in ensuring optimal performance and resource utilization. The system streamlines the ML model serving process, allowing ML engineers to launch models seamlessly, without the need to delve into complex hardware configurations.

We further explore the practical applications of the performance benchmarking system, which are as follows:

Enable ML engineers and data scientists to iterate and experiment faster with models without worrying about hardware, performance characteristics and capacity estimation.
Reduce costs through increased resource utilization by tuning system configurations.
Build guardrails to identify and prevent regressions during rollout of new models and system software.

Outline

Landscape of online ML inference at Linkedin
MLperf Inference benchmarks
Architecture
Applications
Challenges faced and solutions
Future work and conclusion

Authors / Presenters

Karan Goyal
Hareesh Kumar Gajulapalli
Ameya Karve

All submissions

Previous Next

Comments

AS

Anwesha Sen

@anwesha25 Editor & Promoter
Hello Hareesh, please drop an email at anwesha@hasgeek.com as soon as possible so I can schedule your rehearsal. Thank you!

Posted 1 year ago
Share
Copy link
Email
Twitter
Facebook
Linkedin

Nischal HP

@nischalhp Editor
Hello Hareesh Kumar Gajulapalli,

Thank you for your submission. We are reviewing talks and will get back to you with an update shortly.

We would however recommend to have no more than 2 speakers on stage, as it might be a bit jarring as experience for 3 speakers to share the stage for a 30 minute talk. Let us know if that sound reasonable.

Posted 1 year ago
Share
Copy link
Email
Twitter
Facebook
Linkedin
- HG
  
  Hareesh Kumar Gajulapalli
  Hi Nischal,
  Yes, that sounds reasonable. We will plan to have only 2 speakers on the stage. Also let me know if any of the aspects of abstract aren't clear, I would be happy to add things.
  
  Posted 1 year ago (edited 1 year ago)
  
  Share
  Copy link
  Email
  Twitter
  Facebook
  Linkedin

Aug 2023

7 Mon

8 Tue

9 Wed

10 Thu

11 Fri 09:00 AM – 06:00 PM IST

12 Sat

13 Sun

Hybrid access (members only)

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

Supported by

LlamaIndex

E2E Networks Limited

E2E Cloud is India's first AI hyper scaler, a cloud computing platform providing accelerated cloud-based solutions at maximum optimization and lowest pricing

The Fifth Elephant 2023 Monsoon

Online ML model performance benchmarking at Linkedin Scale : Implementation & Applications

Abstract

Outline

Authors / Presenters

Comments

Anwesha Sen

@anwesha25 Editor & Promoter

Nischal HP

@nischalhp Editor

Hareesh Kumar Gajulapalli