Skip to content

[Feature Design] Multi-agent API for TorchRL #943

@matteobettini

Description

@matteobettini

Hello,

This is the official issue for discussing the multi-agent API of torchrl for environments and collectors.

There has been a lot of discussion on wether the agent dimension should be part of the batch_size or not.

In this design doc we propose an API in which the agent dimension is added to the specs, but not to the batch_size of an environment. This allows to have part of the specs which are shared by all agents, like a global state, a global reward or a global done. It also allows to have these attributes as per-agent attributes.

n_workers = 4
n_vectorized_envs = 32
batch_size = (n_workers, n_vectorized_envs)

n_agents = 2
n_obs_per_agent = 6
n_state_features = 18

state_spec = UnboundedContinuousTensorSpec(
    torch.Size((
        n_state_features,
    ))
) # shape = (n_state_features,)
obs_spec = UnboundedContinuousTensorSpec(
    torch.Size((
        n_obs_per_agent,
    ))
)  # shape = (n_obs_per_agent,)
action_spec = UnboundedContinuousTensorSpec(
    torch.Size((2,)),
)  # shape = (2,)
reward_spec = UnboundedContinuousTensorSpec(
    torch.Size((1,)),
) # shape = (1,)
done_spec = UnboundedContinuousTensorSpec(
    torch.Size((1,)),
)  # shape = (1,)

# Create multi-agent specs. In this case specs are identical for all agents but they could differ
multi_agent_obs_spec = torch.stack([obs_spec.clone() for _ in range(n_agents)], dim=0) # UnboundedContinuousTensorSpec with shape = (n_agents, n_obs_per_agent)
multi_agent_action_spec = torch.stack([action_spec.clone() for _ in range(n_agents)], dim=0)  # UnboundedContinuousTensorSpec with  shape = (n_agents, 2)
multi_agent_reward_spec = torch.stack([reward_spec.clone() for _ in range(n_agents)], dim=0)  # UnboundedContinuousTensorSpec with shape = (n_agents, 1)
multi_agent_done_spec = torch.stack([done_spec.clone() for _ in range(n_agents)],dim=0)  # UnboundedContinuousTensorSpec with shape = (n_agents, 1)

input_spec = CompositeSpec(
    action=multi_agent_action_spec
).expand(batch_size) # 
# input_spec.shape = (*batch_size),
# input_spec["action"].shape = (*batch_size, n_agents, 2)

output_spec = CompositeSpec(
    observation=multi_agent_obs_spec,
    done=multi_agent_done_spec, # or done_spec in case of a single done flag for all agents
    reward=multi_agent_reward_spec, # or reward_spec in case of a global reward (DecPOMDP)
    state=state_spec, # this is shared by all agents
).expand(batch_size) 
# output_spec.shape = (*batch_size)
# output_spec["observation"].shape = (*batch_size, n_agents, n_obs_per_agent),
# output_spec["done"].shape = (*batch_size, n_agents, 1),
# output_spec["reward"].shape = (*batch_size, n_agents, 1),
# output_spec["state"].shape = (*batch_size, n_state_features)

Collectors will then consider the batch_size dimensions as dimensions for replicated environments (e.g. parallel workers, vectorized envs, etc) and will count a multi-agent frame (with data for all agents) as 1 frame.

Tagging som folks in my lab for opinions @smorad @Acciorocketships @janblumenkamp

@vmoens if you can tag the other people interested in multi-agent torchrl

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions