ICML 2026 Tutorials

Tutorial

Unifying Attention and Diffusion with Kan Extension Transformers: Structured Deep Learning with Diagrammatic Backpropagation

Sridhar Mahadevan

Jul 6, 9:00 AM - 11:30 AM HALL C

Modern foundation models are powerful, but their representations, training dynamics, and agentic workflows remain difficult to audit, compose, and trust. This tutorial presents a categorical and geometric framework for trustworthy foundation-model systems. The major scientific components of the tutorial include

- **Diagrammatic Backpropagation** (DB), which generalizes deep learning to include curvature loss function over categorical diagrams

- **Infinitesimal Causality** (IC), which generalizes the chain rule in calculus to functors in tangent categories

- **Kan Extension Transformers** (KET), which define a structured computation substrate, unifying attention and diffusion, and providing a universal machine learning framework for mapping finite experience into infinite futures

- **Universal Decision Learning** (UDL), which is a rigorous categorical framework for building foundries, or building blocks of foundation models

- **Lie-algebra based neural adapters** (ALLORA), which shows how to compose LoRa adapters by detecting non-commutativity using Lie-Brackets

- **Agentic skill optimization using Lie Algebroids**(LASKO), which formalizes optimization over tangent Markdown categories

- **Odyssey**: a demonstration system for automatic foundry construction.

The tutorial is designed as a conceptual 2.5-hour overview. Technical details are deferred to associated arXiv papers and the *Categories for AGI* book. Participants will leave with a solid understanding of a powerful categorical and geometric design language for foundation-model systems that learn locally, transfer cautiously, expose obstructions, and glue global conclusions only when the evidence permits.

View full details

Tutorial

Diffusion and Flow-Matching: From Memorization to Generalization & Beyond

Mathurin Massias ⋅ Quentin Bertrand

Jul 6, 9:00 AM - 11:30 AM HALL D1

View full details

Tutorial

Unlearning Data at Scale

Vinith Suriyakumar ⋅ Gautam Kamath ⋅ Ashia Wilson

Jul 6, 9:00 AM - 11:30 AM AUDITORIUM

View full details

Tutorial

Probabilistic Numerics — Computation is Machine Learning

Philipp Hennig ⋅ Marvin Pförtner ⋅ Tim Weiland

Jul 6, 9:00 AM - 11:30 AM HALL D2

Machine learning is the process of estimating latent representations or variables from *finite data*. If the data is insufficient, this inference process leaves a finite *estimation error*. Probabilistic (Bayesian) machine learning attempts to capture this empirical uncertainty in a probability distribution.

But what actually happens inside of a Learning Machine, the computational side of ML, is invariably the solution of a *numerical problem*: *Optimisation* for deep learning, solving *differential equations* for diffusion, flow matching, and scientific simulation, or even just (large-scale, approximate) numerical *linear algebra*. These numerical tasks have no analytic solution in reach. The computational resources are insufficient, and so the computation leaves a finite *computational error*. **Probabilistic numerical methods attempt to capture this computational uncertainty in a probability distribution.**

By matching the mathematical modelling language of the empirical and the computational side of machine learning in this way, probabilistic numerical methods open new opportunities for computational savings, and new functionality in the ML stack: Computational and data uncertainty can be controlled in relation to each other, and information from data can flow "backwards" through a computation to solve inverse problems. A growing research community within ML is developing this toolchain, typically by building on established, highly efficient, classic numerical methods.

The tutorial is split in three parts. We will start with a simple worked example to establish key concepts and patterns. A second part will generalise these insights into a design pattern across a large class of numerical tasks. Finally, a hands-on code demo will demonstrate how probabilistic numerical methods work in practice.

View full details

Tutorial

Proving Theorems with Lean and Machine Learning

Rémy Degenne ⋅ Wenda Li

Jul 6, 9:00 AM - 11:30 AM HALL B2

AI agents can now write mathematics, including proofs of theorems relevant to Machine Learning, but we can’t trust them yet. Subtle errors might be hidden deep in the reasoning steps, and checking the proofs manually takes a lot of time and expertise.
The Lean theorem prover provides a way to write formal, machine-checkable proofs, giving us high confidence in their correctness. AI systems have managed to reach gold medal level at the International Mathematical Olympiad while producing Lean-checked proofs. Could we get them to write research-level, verified mathematics?

In this tutorial, we introduce Lean and its mathematical library Mathlib, and show how they can be used to write trusted proofs, in particular machine learning theory proofs. We then show how machine learning can help with theorem proving, and present recent advances in AI-assisted formalization.

View full details

Tutorial

Adaptive Reasoning in LLMs: From Post-Training to Test-Time Learning (partially remote)

Akhil Arora ⋅ Nouha Dziri

Jul 6, 1:30 PM - 4:00 PM HALL C

View full details

Tutorial

Calibration: From Predictions to Decisions, Collaboration, and Alignment

Aaron Roth ⋅ Natalie Collina ⋅ Ira Globus-Harris

Jul 6, 1:30 PM - 4:00 PM AUDITORIUM

View full details

Tutorial

Evaluating and Training LLMs for Math Copilots and Theorem Proving

Simon Frieder ⋅ Philip Vonderlind

Jul 6, 1:30 PM - 4:00 PM HALL B2

View full details

Tutorial

Is numerical optimization theory irrelevant to machine learning practice in 2026?

Mark Schmidt

Jul 6, 1:30 PM - 4:00 PM HALL D1

We are seeing more numerical optimization theory papers published than ever before. These papers often make unrealistic assumptions or propose algorithms that never get adopted. So is all this optimization theory largely useless?

In this tutorial I show how some surprisingly simple optimization ideas can explain a wide variety of the implementation choices we make when training modern deep learning models. Some of these ideas might have let us skip some generations of grad-student descent, or have led to state-of-the-art tricks in modern architectures. On the other hand, I will highlight how some important practical ideas are not explained by optimization theory and where we can go from here.

Here is a list of keywords to get you (and your LLM sidekick) interested in attending: Adam and [*]A[*]d[*]a[*]m[*], Muon and its friends/enemies, critical-ish batch size, the RMSnorm and skip connection love affair, dead ReLUs and living SwiGLU, Schedule-Free and WSD and muP and max\_grad\_norm = 1.0, variance reduction and shuffle=True, and maybe edge-of-stability/catapults/feature-learning. I may also tell you why your second-order stochastic optimization method did not work.

View full details

Tutorial

New Techniques for Sequence Prediction: Spectral Filtering and Preconditioning

Elad Hazan ⋅ Annie Marsden

Jul 6, 1:30 PM - 4:00 PM HALL D2

View full details

Main Navigation

Tutorials

Unifying Attention and Diffusion with Kan Extension Transformers: Structured Deep Learning with Diagrammatic Backpropagation

Diffusion and Flow-Matching: From Memorization to Generalization & Beyond

Unlearning Data at Scale

Probabilistic Numerics — Computation is Machine Learning

Proving Theorems with Lean and Machine Learning

Adaptive Reasoning in LLMs: From Post-Training to Test-Time Learning (partially remote)

Calibration: From Predictions to Decisions, Collaboration, and Alignment

Evaluating and Training LLMs for Math Copilots and Theorem Proving

Is numerical optimization theory irrelevant to machine learning practice in 2026?

New Techniques for Sequence Prediction: Spectral Filtering and Preconditioning

No Events Found