Publication
*: indicating equal contribution.
2026
Off-Policy Value-Based Reinforcement Learning
for Large Language Models
Peng-Yuan Wang*, Ziniu Li*, Tian Xu*, Bohan Yang, Tian-Shuo Liu, ChenYang Wang, Xiong-Hui Chen, Yi-Chen Li, Tianyun Yang, Chongliang Chen, Yang Yu
arxiv preprint: 2603.23355.
2025
2024
Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning
Chengxing Jia, Chenxiao Gao, Hao Yin, Fuxiang Zhang, Xiong-Hui Chen, Tian Xu, Lei Yuan, Zongzhang Zhang, Yang Yu, Zhi-Hua Zhou
In Proceedings of the 12th International Conference on Learning Representations (ICLR), 2024
2023
2022
2021
2020
2019
|