Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning

Published:

Yingru Li,, Jiawei Xu, Jiacai Liu, Yuxuan Tong, Ziniu Li, Tianle Cai, Ge Zhang, Qian Liu, Baoxiang Wang.