Current Advances in LLM Reasoning

A unified, hands-on tour of how well LLMs reason, how to make them reason better, and where the field is heading next.

Overview

As Large Language Models (LLMs) increasingly tackle reasoning-heavy tasks, from mathematics to commonsense to multilingual understanding, researchers face three pressing questions: How well do models reason? How can we make them reason better? And what are the next frontiers in LLM reasoning?

This tutorial answers these questions through a unified view of LLM reasoning. We explore comprehensive evaluation strategies to assess the reasoning abilities of models and discuss two families of methods that improve reasoning: advanced inference-time methods and post-training methods.

The tutorial is designed for both researchers and practitioners seeking actionable insight into LLM reasoning.

Outline

Part 1
How well can models reason?
~60 min · 20 min hands-on

Leads: Julia Kreutzer, Akhil Arora, Nearchos Potamitis

Foundations for evaluating reasoning beyond accuracy. Multilingual, low-resource, and robustness benchmarks; evaluation dimensions including consistency, faithfulness, calibration, and variance-aware scoring. The hands-on session uses a Colab notebook to assess reasoning variance across languages and seeds, compute calibration error, and visualize consistency metrics.
Part 2
How do we make models reason better?
~90 min · 30 min hands-on

2.1 Inference-time Reasoning Strategies 45 min

Leads: Niket Tandon, Akhil Arora, Nearchos Potamitis

Prompt-based, structured, and agentic reasoning under a single lens: iterative and self-refinement methods, structured prompting via graphs/trees, tool-augmented and agentic reasoning, and dynamic compute allocation. Hands-on: compare iterative, self-consistent, and agentic strategies under latency budgets.

☕ Coffee Break — 30 min

2.2 Post-training & RL-based Reasoning 45 min

Leads: Nouha Dziri, Vishrav Chaudhary

Reinforcement-learning and preference-based post-training (RLHF, DPO, reward modeling) for robustness and controllability. We relate these to inference-time strategies and cover recent alignment variants. Hands-on: how reward-model calibration influences reasoning confidence and abstention.
Part 3
What are the next frontiers in reasoning?
~30 min

3.1 Efficient Reasoning 15 min

Leads: Akhil Arora, Niket Tandon

Sustainable, cost-effective reasoning via adaptive compute reuse, caching, batching, and scalable orchestration.

3.2 Reasoning in High-Stakes Domains 15 min

Leads: Lars Klein

From "reason as hard as possible and guess" to calibrated abstention: training and evaluating models to refuse when in doubt, prioritizing safety and factual reliability.

3.3 Low-Resource & Multilingual Reasoning 15 min

Leads: Julia Kreutzer, Akhil Arora

Advances and open challenges in multilingual and culturally grounded reasoning — cross-lingual transfer, low-resource adaptation, and inclusive benchmark design.