Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning
Published:
Yingru Li,, Jiawei Xu, Jiacai Liu, Yuxuan Tong, Ziniu Li, Tianle Cai, Ge Zhang, Qian Liu, Baoxiang Wang.
Published:
Yingru Li,, Jiawei Xu, Jiacai Liu, Yuxuan Tong, Ziniu Li, Tianle Cai, Ge Zhang, Qian Liu, Baoxiang Wang.