Scaling Test-time Inference Compute & Rise of Reasoning Models

Submitted May 20, 2025

Choose the topic your submission falls under: Next generation architectures I am submitting for: Speaking at the Fifth Elephant 2025 Annual Conference Type of submission: 30 mins talk

Enabling LLMs to improve their outputs by using more test-time computation is a critical step towards building generally self-improving agents that can operate on open-ended natural language. We study the scaling of inference-time computation in LLMs, with a focus on answering the question: if an LLM is allowed to use a fixed but non-trivial amount of inference-time compute, how much can it improve its performance on a challenging prompt? Answering this question has implications not only on the achievable performance of LLMs, but also on the future of LLM pretraining and how one should tradeoff inference-time and pre-training compute. Despite its importance, little research attempted to understand the scaling behaviors of various test-time inference methods.

Needless to say how reasoning LLMs have taken over modern-day AI specifically onto open source models challenging closed source ones onto scaling laws with less compute power and at par performance opening doors to the more rapid development of complex systems. Let’s explore what goes on behind the scenes in this CoT (chain of thoughts) making the model to bring out reward modelling, rather known as reinforcement learning, take place.

All submissions

Previous Next

Comments

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures

The Fifth Elephant 2025 Annual Conference CfP

Scaling Test-time Inference Compute & Rise of Reasoning Models

Comments