Aug 2023
7 Mon
8 Tue
9 Wed
10 Thu
11 Fri 09:00 AM – 06:00 PM IST
12 Sat
13 Sun
Hareesh Kumar Gajulapalli
Submitted Jun 30, 2023
At LinkedIn, we serve 100000s of inferences per second across 100s of ML models concurrently in our online systems. ML models have different system performance characteristics - ranging from lightweight XGBoosts to memory intensive recommendation models, to the newer Generative AI models, which are both compute and memory intensive. We run these models across different hardware profiles - across different CPU and GPU SKUs. Taking these into account, we have built a performance benchmarking system for ML models at LinkedIn based on the MLPerf Inference Benchmark paper. This system plays a crucial role in ensuring optimal performance and resource utilization. The system streamlines the ML model serving process, allowing ML engineers to launch models seamlessly, without the need to delve into complex hardware configurations.
We further explore the practical applications of the performance benchmarking system, which are as follows:
Hosted by
Supported by
Login to leave a comment
Anwesha Sen
@anwesha25 Editor & Promoter
Hello Hareesh, please drop an email at anwesha@hasgeek.com as soon as possible so I can schedule your rehearsal. Thank you!
Nischal HP
@nischalhp Editor
Hello Hareesh Kumar Gajulapalli,
Thank you for your submission. We are reviewing talks and will get back to you with an update shortly.
We would however recommend to have no more than 2 speakers on stage, as it might be a bit jarring as experience for 3 speakers to share the stage for a 30 minute talk. Let us know if that sound reasonable.
Hareesh Kumar Gajulapalli
Hi Nischal,
Yes, that sounds reasonable. We will plan to have only 2 speakers on the stage. Also let me know if any of the aspects of abstract aren't clear, I would be happy to add things.