The Fifth Elephant 2023 Monsoon

On AI, industrial applications of ML, and MLOps

Tickets

Loading…

Hareesh Kumar Gajulapalli

Online ML model performance benchmarking at Linkedin Scale : Implementation & Applications

Submitted Jun 30, 2023

Abstract

At LinkedIn, we serve 100000s of inferences per second across 100s of ML models concurrently in our online systems. ML models have different system performance characteristics - ranging from lightweight XGBoosts to memory intensive recommendation models, to the newer Generative AI models, which are both compute and memory intensive. We run these models across different hardware profiles - across different CPU and GPU SKUs. Taking these into account, we have built a performance benchmarking system for ML models at LinkedIn based on the MLPerf Inference Benchmark paper. This system plays a crucial role in ensuring optimal performance and resource utilization. The system streamlines the ML model serving process, allowing ML engineers to launch models seamlessly, without the need to delve into complex hardware configurations.

We further explore the practical applications of the performance benchmarking system, which are as follows:

  • Enable ML engineers and data scientists to iterate and experiment faster with models without worrying about hardware, performance characteristics and capacity estimation.
  • Reduce costs through increased resource utilization by tuning system configurations.
  • Build guardrails to identify and prevent regressions during rollout of new models and system software.

Outline

  • Landscape of online ML inference at Linkedin
  • MLperf Inference benchmarks
  • Architecture
  • Applications
  • Challenges faced and solutions
  • Future work and conclusion

Authors / Presenters

  • Karan Goyal
  • Hareesh Kumar Gajulapalli
  • Ameya Karve

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid access (members only)

Hosted by

Jump starting better data engineering and AI futures

Supported by

E2E Cloud is India's first AI hyper scaler, a cloud computing platform providing accelerated cloud-based solutions at maximum optimization and lowest pricing