-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Description
🐛 Describe the bug
Following the installation tutorial (conda env setup), I ran:
python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yamlwith the default TORCHSTORE_RDMA_ENABLED=1, I first hit an RDMA-related failure that seems related to this issue.
After disabling RDMA (TORCHSTORE_RDMA_ENABLED=0), the run still failed with:
ActorError: A remote actor call has failed.
AssertionError: Actor object is missing when executing init_backends
Actor object is missing … local_fetcher_actor error
It appears that by the time register_fetcher runs, the policy mesh or its local_fetcher_actor has failed to start, so Monarch reports “actor object is missing” when metric loggers attempt to access it.
Versions
- torch: 2.9.0+cu128
- torchmonarch: 0.1.2
- torchtitan: 0.2.0
- vLLM: 0.10.1.dev0 (commit
6d8d0a24c, built 2025-11-14)
Metadata
Metadata
Assignees
Labels
No labels