Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Published:

Yingru Li*, Jiacai Liu*, Jiawei Xu*, Yuxuan Tong, Ziniu Li, Qian Liu, Baoxiang Wang.