-
Notifications
You must be signed in to change notification settings - Fork 414
Description
Hello,
This is the official issue for discussing the multi-agent API of torchrl for environments and collectors.
There has been a lot of discussion on wether the agent dimension should be part of the batch_size
or not.
In this design doc we propose an API in which the agent dimension is added to the specs, but not to the batch_size
of an environment. This allows to have part of the specs which are shared by all agents, like a global state, a global reward or a global done. It also allows to have these attributes as per-agent attributes.
n_workers = 4
n_vectorized_envs = 32
batch_size = (n_workers, n_vectorized_envs)
n_agents = 2
n_obs_per_agent = 6
n_state_features = 18
state_spec = UnboundedContinuousTensorSpec(
torch.Size((
n_state_features,
))
) # shape = (n_state_features,)
obs_spec = UnboundedContinuousTensorSpec(
torch.Size((
n_obs_per_agent,
))
) # shape = (n_obs_per_agent,)
action_spec = UnboundedContinuousTensorSpec(
torch.Size((2,)),
) # shape = (2,)
reward_spec = UnboundedContinuousTensorSpec(
torch.Size((1,)),
) # shape = (1,)
done_spec = UnboundedContinuousTensorSpec(
torch.Size((1,)),
) # shape = (1,)
# Create multi-agent specs. In this case specs are identical for all agents but they could differ
multi_agent_obs_spec = torch.stack([obs_spec.clone() for _ in range(n_agents)], dim=0) # UnboundedContinuousTensorSpec with shape = (n_agents, n_obs_per_agent)
multi_agent_action_spec = torch.stack([action_spec.clone() for _ in range(n_agents)], dim=0) # UnboundedContinuousTensorSpec with shape = (n_agents, 2)
multi_agent_reward_spec = torch.stack([reward_spec.clone() for _ in range(n_agents)], dim=0) # UnboundedContinuousTensorSpec with shape = (n_agents, 1)
multi_agent_done_spec = torch.stack([done_spec.clone() for _ in range(n_agents)],dim=0) # UnboundedContinuousTensorSpec with shape = (n_agents, 1)
input_spec = CompositeSpec(
action=multi_agent_action_spec
).expand(batch_size) #
# input_spec.shape = (*batch_size),
# input_spec["action"].shape = (*batch_size, n_agents, 2)
output_spec = CompositeSpec(
observation=multi_agent_obs_spec,
done=multi_agent_done_spec, # or done_spec in case of a single done flag for all agents
reward=multi_agent_reward_spec, # or reward_spec in case of a global reward (DecPOMDP)
state=state_spec, # this is shared by all agents
).expand(batch_size)
# output_spec.shape = (*batch_size)
# output_spec["observation"].shape = (*batch_size, n_agents, n_obs_per_agent),
# output_spec["done"].shape = (*batch_size, n_agents, 1),
# output_spec["reward"].shape = (*batch_size, n_agents, 1),
# output_spec["state"].shape = (*batch_size, n_state_features)
Collectors will then consider the batch_size dimensions as dimensions for replicated environments (e.g. parallel workers, vectorized envs, etc) and will count a multi-agent frame (with data for all agents) as 1 frame.
Tagging som folks in my lab for opinions @smorad @Acciorocketships @janblumenkamp
@vmoens if you can tag the other people interested in multi-agent torchrl