Fix: corrected fsdp in GRPO trainer #3582

tryumanshow · 2025-06-13T19:32:26Z

What does this PR do?

Fixes a bug where GRPOTrainer using Fully Sharded Data Parallel (FSDP) with vLLM inference fails with AssertionError: Non-root FSDP instance's _is_root should not have been set yet or should have been set to False during parameter sync.

Fixes: #3394 (🧑‍🤝‍🧑 Co-Locating vLLM w/ training to for higher throughput and GPU utilization)

Root Cause

When syncing FSDP parameters to vLLM, summon_full_params was recursively called for every FSDP submodule, causing PyTorch's FSDP internal state to become inconsistent. PyTorch expects only the root FSDP module to perform summon_full_params(recurse=True).

Improvements:

Refactors GRPOTrainer._sync_fsdp_params_to_vllm to call FSDP.summon_full_params once at the root (recurse=True), instead of recursively calling it for each FSDP submodule.
Prevents assertion errors in multi-GPU FSDP training with vLLM parameter syncing.
Ensures memory-efficient, correct traversal for parameter extraction and weight updates to vLLM.

Testing

Click to view fsdp.yaml

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: false
  fsdp_forward_prefetch: false
  fsdp_offload_params: false
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: false
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

vllm server mode

click to view config.yaml

# Model arguments
model_name_or_path: Qwen/Qwen2.5-3B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2

# Data training arguments
dataset_name: DigitalLearningGmbH/MATH-lighteval
dataset_config: default
dataset_prompt_column: problem
system_prompt: "You are a helpful AI Assistant, designed to provided well-reasoned and detailed responses. You FIRST think about the reasoning process as an internal monologue and then provide the user with the answer. The reasoning process MUST BE enclosed within <think> and </think> tags."

# GRPO trainer config
bf16: true
use_vllm: true
do_eval: false
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
learning_rate: 3.0e-06
log_completions: false
log_level: info
logging_first_step: true
logging_steps: 1
logging_strategy: steps
lr_scheduler_type: cosine
max_prompt_length: 512
max_completion_length: 1024
max_steps: 50
num_generations: 16
num_train_epochs: 1
overwrite_output_dir: true
per_device_eval_batch_size: 4
per_device_train_batch_size: 16
push_to_hub: false
report_to:
- wandb
reward_funcs:
- accuracy
- format
reward_weights:
- 1.0
- 1.0
save_strategy: steps
save_steps: 100
save_total_limit: 1
seed: 42
warmup_ratio: 0.1

# rollout
CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-3B-Instruct

# train
CUDA_VISIBLE_DEVICES=1,2,3,4 ACCELERATE_LOG_LEVEL=info  accelerate launch --config_file ./accelerate_configs/fsdp.yaml --num_processes 4  open_r1.grpo --config config.yaml

click to view test images

GPU Occupation
Training Log

vllm colocate mode

click to view config.yaml

# Model arguments
model_name_or_path: Qwen/Qwen2.5-3B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2

# Data training arguments
dataset_name: DigitalLearningGmbH/MATH-lighteval
dataset_config: default
dataset_prompt_column: problem
system_prompt: "You are a helpful AI Assistant, designed to provided well-reasoned and detailed responses. You FIRST think about the reasoning process as an internal monologue and then provide the user with the answer. The reasoning process MUST BE enclosed within <think> and </think> tags."

# GRPO trainer config
bf16: true
use_vllm: true
vllm_mode: "colocate"
vllm_tensor_parallel_size: 4
vllm_gpu_memory_utilization: 0.3
do_eval: false
gradient_accumulation_steps: 1
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
learning_rate: 3.0e-06
log_completions: false
log_level: info
logging_first_step: true
logging_steps: 1
logging_strategy: steps
lr_scheduler_type: cosine
max_prompt_length: 512
max_completion_length: 1024
max_steps: 50
num_generations: 16
num_train_epochs: 1
overwrite_output_dir: true
per_device_eval_batch_size: 4
per_device_train_batch_size: 16
push_to_hub: false
report_to:
- wandb
reward_funcs:
- accuracy
- format
reward_weights:
- 1.0
- 1.0
save_strategy: steps
save_steps: 100
save_total_limit: 1
seed: 42
warmup_ratio: 0.1

# rollout & train
CUDA_VISIBLE_DEVICES=0,1,2,3 ACCELERATE_LOG_LEVEL=info  accelerate launch --config_file ./accelerate_configs/fsdp.yaml --num_processes 4  open_r1.grpo --config config.yaml

click to view test images

GPU Occupation
Training Log

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

New version of #3394 (partially modified the FSDP section)

CC @qgallouedec

Copilot

Pull Request Overview

This PR fixes a bug in GRPOTrainer related to FSDP parameter syncing with vLLM, avoiding assertion errors by refactoring the synchronization logic.

Refactored _sync_fsdp_params_to_vllm to restrict FSDP.summon_full_params with recurse=True to only the root FSDP module.
Revised recursive traversal to ensure proper syncing of parameters without reprocessing submodules.

Comments suppressed due to low confidence (1)

trl/trainer/grpo_trainer.py:884

Using getattr(module, '_is_root', True) defaults to True, which may inadvertently treat modules without the '_is_root' attribute as root FSDP modules. Consider ensuring that the attribute is explicitly set for non-root FSDP modules to avoid potential unintended behavior.

        if isinstance(module, FSDP) and getattr(module, '_is_root', True):

HuggingFaceDocBuilderDev · 2025-06-23T15:03:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kashif · 2025-06-23T15:17:39Z

would you mind fixing the formatting issue by doing: make precommit in the root of the TRL repo?

tryumanshow · 2025-06-24T02:31:25Z

would you mind fixing the formatting issue by doing: make precommit in the root of the TRL repo?

Sure! I'm done with it!

shirinyamani · 2025-06-24T15:14:08Z

Hi @tryumanshow thanks for your contribution!
I wanna test your pr, for that i wanna make sure i understand correctly the flow of your work.

click to view config.yaml

# rollout
CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-3B-Instruct

# train
CUDA_VISIBLE_DEVICES=1,2,3,4 ACCELERATE_LOG_LEVEL=info  accelerate launch --config_file ./accelerate_configs/fsdp.yaml --num_processes 4  open_r1.grpo --config config.yaml

Here open_r1.grpo are you using the grpo_script from open-r1 as your train script?

tryumanshow · 2025-06-25T01:26:14Z

Hi @tryumanshow thanks for your contribution! I wanna test your pr, for that i wanna make sure i understand correctly the flow of your work.
click to view config.yaml
# rollout
CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model Qwen/Qwen2.5-3B-Instruct

# train
CUDA_VISIBLE_DEVICES=1,2,3,4 ACCELERATE_LOG_LEVEL=info  accelerate launch --config_file ./accelerate_configs/fsdp.yaml --num_processes 4  open_r1.grpo --config config.yaml
Here open_r1.grpo are you using the grpo_script from open-r1 as your train script?

Hi, @shirinyamani.

Yes, I used the script from open-r1!

kashif · 2025-06-28T05:38:29Z

@tryumanshow just to replicate can you also paste your trl env ?

tryumanshow · 2025-06-29T15:54:36Z

@tryumanshow just to replicate can you also paste your trl env ?

I used the trl==0.18.1.

mcleish7 · 2025-07-16T12:54:11Z

I had the same error and this fix worked for me.

Fix: corrected fsdp in GRPO trainer

dc87cd7

tryumanshow marked this pull request as draft June 13, 2025 19:36

tryumanshow marked this pull request as ready for review June 13, 2025 19:58

qgallouedec and others added 3 commits June 19, 2025 23:15

Merge branch 'main' into fix_grpo_fsdp

1dce54f

Merge branch 'main' into fix_grpo_fsdp

39b9d94

Merge branch 'main' into fix_grpo_fsdp

d106a49

kashif requested a review from Copilot June 21, 2025 06:19

Copilot AI reviewed Jun 21, 2025

View reviewed changes

Merge branch 'main' into fix_grpo_fsdp

dc4966a

check precommit

9de62ae

Merge branch 'main' into fix_grpo_fsdp

545206b

shirinyamani self-requested a review June 24, 2025 15:09

shirinyamani and others added 4 commits June 25, 2025 16:59

Merge branch 'main' into fix_grpo_fsdp

e400a3e

Merge branch 'main' into fix_grpo_fsdp

1ca6483

Merge branch 'main' into fix_grpo_fsdp

11842fe

Merge branch 'main' into fix_grpo_fsdp

eccf911

shirinyamani and others added 5 commits July 1, 2025 17:40

Merge branch 'main' into fix_grpo_fsdp

aac66f0

Merge branch 'main' into fix_grpo_fsdp

4f41534

Merge branch 'main' into fix_grpo_fsdp

0458cfa

Merge branch 'main' into fix_grpo_fsdp

93a9c2c

Merge branch 'main' into fix_grpo_fsdp

a3015a9

Merge branch 'main' into fix_grpo_fsdp

0ae4e21

tryumanshow and others added 5 commits July 20, 2025 15:15

Merge branch 'main' into fix_grpo_fsdp

ea8742f

Merge branch 'main' into fix_grpo_fsdp

3175134

Merge branch 'main' into fix_grpo_fsdp

78cde9f

Merge branch 'main' into fix_grpo_fsdp

de929c8

Merge branch 'main' into fix_grpo_fsdp

604efb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: corrected fsdp in GRPO trainer #3582

Fix: corrected fsdp in GRPO trainer #3582

tryumanshow commented Jun 13, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 23, 2025

Uh oh!

kashif commented Jun 23, 2025

Uh oh!

tryumanshow commented Jun 24, 2025

Uh oh!

shirinyamani commented Jun 24, 2025

Uh oh!

tryumanshow commented Jun 25, 2025

Uh oh!

kashif commented Jun 28, 2025

Uh oh!

tryumanshow commented Jun 29, 2025

Uh oh!

mcleish7 commented Jul 16, 2025

Uh oh!

Uh oh!

Fix: corrected fsdp in GRPO trainer #3582

Are you sure you want to change the base?

Fix: corrected fsdp in GRPO trainer #3582

Conversation

tryumanshow commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Root Cause

Improvements:

Testing

vllm server mode

vllm colocate mode

Before submitting

Who can review?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

HuggingFaceDocBuilderDev commented Jun 23, 2025

Uh oh!

kashif commented Jun 23, 2025

Uh oh!

tryumanshow commented Jun 24, 2025

Uh oh!

shirinyamani commented Jun 24, 2025

Uh oh!

tryumanshow commented Jun 25, 2025

Uh oh!

kashif commented Jun 28, 2025

Uh oh!

tryumanshow commented Jun 29, 2025

Uh oh!

mcleish7 commented Jul 16, 2025

Uh oh!

Uh oh!

tryumanshow commented Jun 13, 2025 •

edited

Loading