SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
The official implementation of SPC (NeurIPS 2025). [arXiv] [Project] [Hugging Face]
Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong.
If you have any questions, please contact me by email: jqchen(at)cs.hku.hk
Please install these requirements:
pip install -r requirements.txtFor inference, please also install vllm (we use version 0.6.6).
Please find our generated training data and evaluation datasets here.
data_round0_sft_critic.json contains SFT data and data_round2_rl_critic.json contains data generated in rounds 1 and 2 for RL training.
The three files in data/eval correspond to the datasets used for evaluating the critic.
We have uploaded the trained SFT critic model (round 0) and RL critic model (round 2) to Hugging Face!
You can use the provided data to finetune the SFT critic model into the round 2 critic model.
Please modify the data and model paths in the script as needed before running:
bash scripts/rl_critic.shAfter training a critic model or directly using our provided checkpoints, please set the dataset and checkpoint paths in the following script to perform evaluation.
python3 eval/infer_batch.py@article{chen2025spc,
title={SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning},
author={Chen, Jiaqi and Zhang, Bang and Ma, Ruotian and Wang, Peisong and Liang, Xiaodan and Tu, Zhaopeng and Li, Xiaolong and Wong, Kwan-Yee~K.},
journal={arXiv preprint arXiv:2504.19162},
year={2025}
}
