₹11 Lakh/Month: How We Took the GPU Out of Face Match

Jun 2026

15 Mon

16 Tue

17 Wed

18 Thu

19 Fri 02:00 PM – 06:00 PM IST

20 Sat

21 Sun

₹11 Lakh/Month: How We Took the GPU Out of Face Match

Submitted May 26, 2026

Submission type: Anchor talk (30 mins)

Face matching is one of the highest-volume workloads in identity verification. At IDfy, a single GPU pod handling 1 RPS cost us ₹3,500/day. After moving the model to BF16 inference on Intel CPUs via OpenVINO, the same 1 RPS pod cost ₹350/day. Same TAT, same throughput, same accuracy envelope. At our traffic shape (50 RPS sustained for the peak hour, 10 RPS for the remaining 23), that translates to roughly ₹11 lakh a month in savings on this single workload, before you account for the GPU capacity it freed up for workloads that genuinely need it.

This talk is not a “CPU beats GPU” pitch. It is the operational story of how we got there: the calibration set we built, the operators that refused to quantize cleanly, the one architectural tweak we made so OpenVINO could fuse properly, and the production canary we ran to convince ourselves the accuracy was stable. I’ll share two more migrations from IDfy’s 40+ model fleet, including one where the move failed in production and what telemetry caught it before users did.

Takeaways:

A cost-vs-latency decision matrix for GPU vs quantized CPU inference, with the metrics that actually predict whether a migration will survive production.
2. The three quantization failure modes we see most often, and the observability signals that catch them before users do.

Audience:

Production ML and AI engineers, platform and infra teams, and engineering leaders who own inference cost-to-serve at scale.

Bio:

Vivek Kalyanarangan is Sr. Technical Architect, AI at IDfy, where a 20-person team operates 40+ production ML models across biometric authentication, document recogntion and OCR, fraud detection and large scale NLP. He has 13+ years across analytics, big data, and deep learning.

Author of Quantization and Fast Inference (Manning, MEAP 2026) and freeCodeCamp course LLMs from Scratch. Contributor to open source ML and published papers.

Enterprise AI in Production

₹11 Lakh/Month: How We Took the GPU Out of Face Match

Comments