Aug 2023
7 Mon
8 Tue
9 Wed
10 Thu
11 Fri 09:00 AM – 06:00 PM IST
12 Sat
13 Sun
Hareesh Kumar Gajulapalli
At LinkedIn, we serve 100000s of inferences per second across 100s of ML models concurrently in our online systems. ML models have different system performance characteristics - ranging from lightweight XGBoosts to memory intensive recommendation models, to the newer Generative AI models, which are both compute and memory intensive. We run these models across different hardware profiles - across different CPU and GPU SKUs. Taking these into account, we have built a performance benchmarking system for ML models at LinkedIn based on the MLPerf Inference Benchmark paper. This system plays a crucial role in ensuring optimal performance and resource utilization. The system streamlines the ML model serving process, allowing ML engineers to launch models seamlessly, without the need to delve into complex hardware configurations.
We further explore the practical applications of the performance benchmarking system, which are as follows:
Hosted by
Supported by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}