ACL 2026 · San Diego · 2 July 2026
IJCAI 2026 · Bremen · TBD
Current Advances in LLM Reasoning
A unified, hands-on tour of how well LLMs reason, how to make them reason better, and where the field is heading next.
Overview
As Large Language Models (LLMs) increasingly tackle reasoning-heavy tasks, from mathematics to commonsense to multilingual understanding, researchers face three pressing questions: How well do models reason? How can we make them reason better? And what are the next frontiers in LLM reasoning?
This tutorial answers these questions through a unified view of LLM reasoning. We explore comprehensive evaluation strategies to assess the reasoning abilities of models and discuss two families of methods that improve reasoning: advanced inference-time methods and post-training methods.
The tutorial is designed for both researchers and practitioners seeking actionable insight into LLM reasoning.
Outline
-
Part 1
How well can models reason?
Leads: Julia Kreutzer, Akhil Arora, Nearchos Potamitis
Foundations for evaluating reasoning beyond accuracy. Multilingual, low-resource, and robustness benchmarks; evaluation dimensions including consistency, faithfulness, calibration, and variance-aware scoring. The hands-on session uses a Colab notebook to assess reasoning variance across languages and seeds, compute calibration error, and visualize consistency metrics.
-
Part 2
How do we make models reason better?
2.1 Inference-time Reasoning Strategies
Leads: Niket Tandon, Akhil Arora, Nearchos Potamitis
Prompt-based, structured, and agentic reasoning under a single lens: iterative and self-refinement methods, structured prompting via graphs/trees, tool-augmented and agentic reasoning, and dynamic compute allocation. Hands-on: compare iterative, self-consistent, and agentic strategies under latency budgets.
☕ Coffee Break — 30 min2.2 Post-training & RL-based Reasoning
Leads: Nouha Dziri, Vishrav Chaudhary
Reinforcement-learning and preference-based post-training (RLHF, DPO, reward modeling) for robustness and controllability. We relate these to inference-time strategies and cover recent alignment variants. Hands-on: how reward-model calibration influences reasoning confidence and abstention.
-
Part 3
What are the next frontiers in reasoning?
3.1 Efficient Reasoning
Leads: Akhil Arora, Niket Tandon
Sustainable, cost-effective reasoning via adaptive compute reuse, caching, batching, and scalable orchestration.
3.2 Reasoning in High-Stakes Domains
Leads: Lars Klein
From "reason as hard as possible and guess" to calibrated abstention: training and evaluating models to refuse when in doubt, prioritizing safety and factual reliability.
3.3 Low-Resource & Multilingual Reasoning
Leads: Julia Kreutzer, Akhil Arora
Advances and open challenges in multilingual and culturally grounded reasoning — cross-lingual transfer, low-resource adaptation, and inclusive benchmark design.
Speakers
-
Akhil Arora
Aarhus University
-
Vishrav Chaudhary
Meta Superintelligence Labs
-
Julia Kreutzer
Cohere Labs
-
Nearchos Potamitis
Aarhus University
-
Lars Klein
EPFL
-
Nouha Dziri
Allen Institute for AI
-
Niket Tandon
Microsoft Research
Materials
- Slides — Coming soon
- Colab notebooks — Coming soon
- Reading list — Coming soon
Citation
Coming soon