SPC

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

The official implementation of SPC (NeurIPS 2025). [arXiv] [Project] [Hugging Face]

Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong.

If you have any questions, please contact me by email: jqchen(at)cs.hku.hk

Environment 🔧

Please install these requirements:

pip install -r requirements.txt

For inference, please also install vllm (we use version 0.6.6).

Data 📚

Please find our generated training data and evaluation datasets here.

data_round0_sft_critic.json contains SFT data and data_round2_rl_critic.json contains data generated in rounds 1 and 2 for RL training.

The three files in data/eval correspond to the datasets used for evaluating the critic.

Checkpoints 🤗

We have uploaded the trained SFT critic model (round 0) and RL critic model (round 2) to Hugging Face!

Reinforcement Finetuning 🔥

You can use the provided data to finetune the SFT critic model into the round 2 critic model.

Please modify the data and model paths in the script as needed before running:

bash scripts/rl_critic.sh

Evaluation 🚀

After training a critic model or directly using our provided checkpoints, please set the dataset and checkpoint paths in the following script to perform evaluation.

python3 eval/infer_batch.py

Citation 🌟

@article{chen2025spc,
  title={SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning},
  author={Chen, Jiaqi and Zhang, Bang and Ma, Ruotian and Wang, Peisong and Liang, Xiaodan and Tu, Zhaopeng and Li, Xiaolong and Wong, Kwan-Yee~K.},
  journal={arXiv preprint arXiv:2504.19162},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
config		config
eval		eval
figs		figs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SPC

Environment 🔧

Data 📚

Checkpoints 🤗

Reinforcement Finetuning 🔥

Evaluation 🚀

Citation 🌟

About

Uh oh!

Releases

Packages

Uh oh!

Languages

chen-judge/SPC

Folders and files

Latest commit

History

Repository files navigation

SPC

Environment 🔧

Data 📚

Checkpoints 🤗

Reinforcement Finetuning 🔥

Evaluation 🚀

Citation 🌟

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages