Skip to content

Conversation

@erranlli
Copy link
Contributor

Summary

Implements graceful degradation for prompts exceeding max_prompt_length, preventing training crashes.

Problem

Training crashed when prompts exceeded max length: Exception: Trajectory {idx}: initial prompt length 3302 already exceeded max_prompt_length 2048, retrying

Solution

  • ✅ Overlong prompts return None and are skipped gracefully
  • ✅ Batch size dynamically adjusts to match

Key Changes

  1. agent_execution_engine.py: Return None for overlong prompts instead of crashing
  2. agent_ppo_trainer.py: Track skipped indices and filter batch to match

Benefits

  • Training continues instead of failing
  • No NaN gradients from division by zero
  • Dynamic batch size adjustment
  • Clean and simple implementation

Testing

bash examples/deepscaler/test_graceful_degradation.sh

Expected: Training continues with warnings when 3302-token prompt is encountered, no crashes.

Files Changed

  • rllm/engine/agent_execution_engine.py (graceful degradation)
  • rllm/trainer/verl/agent_ppo_trainer.py (batch alignment)

@LianShuQuan
Copy link
Contributor

in generate_agent_trajectories_async()

                async for item in self.agent_execution_engine.trajectory_generator(timing_raw=timing_raw, mode=mode, meta_info=meta_info):
                    # This item can not be None. Instead, overlong prompts will be skipped.
                    queue.put(item)

Because

            if item is None:
                break

And then in generate_agent_trajectory()

                for trajectory in gen_seq_generator:
                    # Skip None trajectories (overlong prompts)
                    if trajectory is not None:
                        trajectories.append(trajectory)

is not necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants