Feb 2026
16 Mon
17 Tue
18 Wed
19 Thu
20 Fri
21 Sat
22 Sun
Feb 2026
23 Mon
24 Tue
25 Wed
26 Thu
27 Fri 10:00 AM – 04:30 PM IST
28 Sat 09:00 AM – 06:00 PM IST
1 Sun
Kundan Kumar
@shadow_walker9170 BOF facilitator
Submitted Jan 26, 2026
The core problem discussed is the “Memory Wall” in LLM inference—where GPU computational power has scaled dramatically (~50,000x+ in the last decade), but memory bandwidth has lagged (only 100x growth), making inference memory-bound rather than compute-bound. This leads to idle GPU cores, high latency, and inefficient resource utilization, especially for long-context models and batch processing.
Under this topic, we intend to cover a few popular techniques on improving memory usage efficiency such as the following to unlock the LLM potentials for:
This discussion will benefit:
Kundan Kumar is a final year Computer Science student at IIT Kanpur. He has worked on KV caching systems at Nutanix as a visiting researcher. His interests lie at the intersection of systems optimization and AI infrastructure.
Hosted by
Supported by
Platinum sponsor
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}