Future Proofing ML Inferencing: High Performance Java meets Scalable Remote Serving

Sep 2025

15 Mon 11:00 AM – 11:59 PM IST

16 Tue 11:00 AM – 11:59 PM IST

17 Wed 11:00 AM – 11:59 PM IST

18 Thu 11:00 AM – 11:59 PM IST

19 Fri 11:00 AM – 11:59 PM IST

20 Sat 11:00 AM – 11:59 PM IST

21 Sun 11:00 AM – 11:59 PM IST

Sep 2025

22 Mon 11:00 AM – 11:59 PM IST

23 Tue 11:00 AM – 11:59 PM IST

24 Wed 11:00 AM – 11:59 PM IST

25 Thu 11:00 AM – 11:59 PM IST

26 Fri 11:00 AM – 11:59 PM IST

27 Sat 11:00 AM – 11:59 PM IST

28 Sun 11:00 AM – 11:59 PM IST

Sep 2025

29 Mon 11:00 AM – 11:59 PM IST

30 Tue 11:00 AM – 11:59 PM IST

1 Wed 11:00 AM – 11:59 PM IST

2 Thu 11:00 AM – 11:59 PM IST

3 Fri 11:00 AM – 11:59 PM IST

4 Sat 11:00 AM – 11:59 PM IST

5 Sun 11:00 AM – 11:59 PM IST

Oct 2025

6 Mon 11:00 AM – 11:59 PM IST

7 Tue 11:00 AM – 11:59 PM IST

8 Wed 11:00 AM – 11:59 PM IST

9 Thu 11:00 AM – 11:59 PM IST

10 Fri 11:00 AM – 11:59 PM IST

11 Sat 11:00 AM – 11:59 PM IST

12 Sun 11:00 AM – 11:59 PM IST

Oct 2025

13 Mon 11:00 AM – 11:59 PM IST

14 Tue 11:00 AM – 11:59 PM IST

15 Wed 11:00 AM – 11:59 PM IST

16 Thu 11:00 AM – 11:59 PM IST

17 Fri 11:00 AM – 11:59 PM IST

18 Sat 11:00 AM – 11:59 PM IST

19 Sun 11:00 AM – 11:59 PM IST

Oct 2025

20 Mon 11:00 AM – 11:59 PM IST

21 Tue 11:00 AM – 11:59 PM IST

22 Wed 11:00 AM – 11:59 PM IST

23 Thu 11:00 AM – 11:59 PM IST

24 Fri 11:00 AM – 11:59 PM IST

25 Sat 11:00 AM – 11:59 PM IST

26 Sun 11:00 AM – 11:59 PM IST

Oct 2025

27 Mon 11:00 AM – 11:59 PM IST

28 Tue 11:00 AM – 11:59 PM IST

29 Wed 11:00 AM – 11:59 PM IST

30 Thu 11:00 AM – 11:59 PM IST

31 Fri 11:00 AM – 11:59 PM IST

1 Sat 11:00 AM – 11:59 PM IST

2 Sun 11:00 AM – 11:59 PM IST

Nov 2025

3 Mon 11:00 AM – 11:59 PM IST

4 Tue 11:00 AM – 11:59 PM IST

5 Wed 11:00 AM – 11:59 PM IST

6 Thu 11:00 AM – 11:59 PM IST

7 Fri 11:00 AM – 11:59 PM IST

8 Sat 11:00 AM – 11:59 PM IST

9 Sun 11:00 AM – 11:59 PM IST

Nov 2025

10 Mon 11:00 AM – 11:59 PM IST

11 Tue 11:00 AM – 11:59 PM IST

12 Wed 11:00 AM – 11:59 PM IST

13 Thu 11:00 AM – 11:59 PM IST

14 Fri 11:00 AM – 11:59 PM IST

15 Sat 11:00 AM – 11:59 PM IST

16 Sun 11:00 AM – 11:59 PM IST

Nov 2025

17 Mon 11:00 AM – 11:59 PM IST

18 Tue 11:00 AM – 11:59 PM IST

19 Wed 11:00 AM – 11:59 PM IST

20 Thu 11:00 AM – 11:59 PM IST

21 Fri 11:00 AM – 11:59 PM IST

22 Sat 11:00 AM – 11:59 PM IST

23 Sun 11:00 AM – 11:59 PM IST

Nov 2025

24 Mon 11:00 AM – 11:59 PM IST

25 Tue 11:00 AM – 11:59 PM IST

26 Wed 11:00 AM – 11:59 PM IST

27 Thu 11:00 AM – 11:59 PM IST

28 Fri 11:00 AM – 11:59 PM IST

29 Sat 11:00 AM – 11:59 PM IST

30 Sun 11:00 AM – 11:59 PM IST

Dec 2025

1 Mon 11:00 AM – 11:59 PM IST

2 Tue 11:00 AM – 11:59 PM IST

3 Wed 11:00 AM – 11:59 PM IST

4 Thu 11:00 AM – 11:59 PM IST

5 Fri

6 Sat

7 Sun

Future Proofing ML Inferencing: High Performance Java meets Scalable Remote Serving

Submitted Nov 16, 2025

Type of submission: 15 mins talk

In this talk, we present a hybrid ML inference architecture designed for today’s performance demands and tomorrow’s infrastructure challenges. We combine SIMD-accelerated inference in pure Java—using the Vector API and Fused Multiply-Add (FMA)—with remote inference via TensorFlow Serving, ONNX, and Triton. This allows us to strike the right balance: ultra-low latency and high throughput on critical paths, and flexible, scalable inference for complex models with more relaxed SLAs. Our Java implementation avoids JNI entirely. Through assembly-level micro-benchmarking, we fine-tune matrix operations to achieve 250% improvements in latency and throughput, while staying fully within the JVM—ensuring portability, debuggability, and operational simplicity. We don’t claim Java beats C++ or Rust. Instead, we show it’s now a viable and future-ready option for performance-critical inference—especially in JVM-based stacks where speed, iteration velocity, and cost efficiency all matter. Why this matters: As models grow and workloads scale, optimizing inference infrastructure is no longer optional. Businesses need architectures that can evolve with hardware, control cloud costs, and meet diverse performance needs. Attendees will take away a blueprint for modern inference—from low-level SIMD tuning in Java to when and how to leverage remote inference—all focused on building scalable, cost-effective, and future-proof ML platforms.

Businesses requiring low latency model serving, Machine Learning engineers and tech savy enthusiasts are the target audience for this talk.

Bio: I’m Sushrut Ikhar from Inmobi, a seasoned software architect with over a decade of experience in building and scaling data and machine learning platforms. At InMobi, India’s first unicorn, he leads the Machine Learning Platform team, guiding a group of architects and engineers in designing resilient, high-performance systems that power large-scale AI applications.My core expertise spans the modern data and ML ecosystem, including Spark, Hadoop, Ray, TensorFlow, PyTorch, Airflow, Java, and Databricks, along with experience architecting high-scale, low-latency serving systems.

The Fifth Elephant 2025 Winter Edition Call for Proposals

Future Proofing ML Inferencing: High Performance Java meets Scalable Remote Serving

Comments