ACL 2026 · San Diego · 2 July 2026

IJCAI 2026 · Bremen · TBD

Current Advances in LLM Reasoning

A unified, hands-on tour of how well LLMs reason, how to make them reason better, and where the field is heading next.

Overview

As Large Language Models (LLMs) increasingly tackle reasoning-heavy tasks, from mathematics to commonsense to multilingual understanding, researchers face three pressing questions: How well do models reason? How can we make them reason better? And what are the next frontiers in LLM reasoning?

This tutorial answers these questions through a unified view of LLM reasoning. We explore comprehensive evaluation strategies to assess the reasoning abilities of models and discuss two families of methods that improve reasoning: advanced inference-time methods and post-training methods.

The tutorial is designed for both researchers and practitioners seeking actionable insight into LLM reasoning.

Outline

  1. Part 1

    How well can models reason?

    ~60 min · 20 min hands-on

    Leads: Julia Kreutzer, Akhil Arora, Nearchos Potamitis

    Foundations for evaluating reasoning beyond accuracy. Multilingual, low-resource, and robustness benchmarks; evaluation dimensions including consistency, faithfulness, calibration, and variance-aware scoring. The hands-on session uses a Colab notebook to assess reasoning variance across languages and seeds, compute calibration error, and visualize consistency metrics.

  2. Part 2

    How do we make models reason better?

    ~90 min · 30 min hands-on

    2.1 Inference-time Reasoning Strategies 45 min

    Leads: Niket Tandon, Akhil Arora, Nearchos Potamitis

    Prompt-based, structured, and agentic reasoning under a single lens: iterative and self-refinement methods, structured prompting via graphs/trees, tool-augmented and agentic reasoning, and dynamic compute allocation. Hands-on: compare iterative, self-consistent, and agentic strategies under latency budgets.

    ☕ Coffee Break — 30 min

    2.2 Post-training & RL-based Reasoning 45 min

    Leads: Nouha Dziri, Vishrav Chaudhary

    Reinforcement-learning and preference-based post-training (RLHF, DPO, reward modeling) for robustness and controllability. We relate these to inference-time strategies and cover recent alignment variants. Hands-on: how reward-model calibration influences reasoning confidence and abstention.

  3. Part 3

    What are the next frontiers in reasoning?

    ~30 min

    3.1 Efficient Reasoning 15 min

    Leads: Akhil Arora, Niket Tandon

    Sustainable, cost-effective reasoning via adaptive compute reuse, caching, batching, and scalable orchestration.

    3.2 Reasoning in High-Stakes Domains 15 min

    Leads: Lars Klein

    From "reason as hard as possible and guess" to calibrated abstention: training and evaluating models to refuse when in doubt, prioritizing safety and factual reliability.

    3.3 Low-Resource & Multilingual Reasoning 15 min

    Leads: Julia Kreutzer, Akhil Arora

    Advances and open challenges in multilingual and culturally grounded reasoning — cross-lingual transfer, low-resource adaptation, and inclusive benchmark design.

Speakers

Materials

Citation

Coming soon