Skip to yearly menu bar Skip to main content


Poster Mon, Jul 6, 2026 • 6:30 PM – 8:15 PM PDT HALL A #123

DisPPO: Quantile-Based Distributional Reinforcement Learning for Large Language Models

Zhijian Zhou ⋅ Long Li ⋅ Xuan Zhang ⋅ Zongkai Liu ⋅ Yanting Miao ⋅ Yuchen Liu ⋅ Deshu Chen ⋅ Ke Li ⋅ Xing Sun ⋅ Ruoxi Jiang ⋅ Xiaoyu Tan ⋅ Chao Qu ⋅ Yuan Qi

Abstract

Log in and register to view live content