Submissions
Kundan Kumar

Kundan Kumar

@shadow_walker9170

  • Joined Jan 2026

The Fifth Elephant Pune edition

LLM Inference Optimizations: A Deep Dive into Modern Techniques

Problem statement The core problem discussed is the “Memory Wall” in LLM inference—where GPU computational power has scaled dramatically (~50,000x+ in the last decade), but memory bandwidth has lagged (only 100x growth), making inference memory-bound rather than compute-bound. This leads to idle GPU cores, high latency, and inefficient resource utilization, especially for long-context models and … more
  • 0 comments
  • Submitted
  • 26 Jan 2026
Indicate the track in which your submission fits: Track 1 AI in Software Development Life Cycle (SDLC) Type of submission: Birds of Feather (BOF) session