-
Notifications
You must be signed in to change notification settings - Fork 414
[Feature] Gym 'vectorized' envs compatibility #1519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertbou92 @matteobettini should we add a "_reset" key too?
IMO the transforms can take care of it but we should test it
torchrl/envs/libs/gym.py
Outdated
# return ( | ||
# MultiDiscreteTensorSpec(spec.nvec, device=device, dtype=dtype) | ||
# if categorical_action_encoding | ||
# else MultiOneHotDiscreteTensorSpec(spec.nvec, device=device, dtype=dtype) | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be removed
# return ( | |
# MultiDiscreteTensorSpec(spec.nvec, device=device, dtype=dtype) | |
# if categorical_action_encoding | |
# else MultiOneHotDiscreteTensorSpec(spec.nvec, device=device, dtype=dtype) | |
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait why would it be removed? this is actually very needed and important
(it is used in mpe and vmas) (althoug not supported completely in our losses :( )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between that and a stack of discrete specs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing is that a MultiDiscrete(2, 2) has a batch-size of 2, but that is being translated to a MultiOneHot of size (4,), which is batch-size incompatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are envs where you need to select two actions at the same time (let's say one for movement and one for comms)
those 2 actions might have different spaces. for example in vmas movement is of space 4 and comms can be anything (for example 10). why would we stack this?
if the problem here is that they are using multidiscrete to indicate batched envs i think they overloaded multidiscrete then, because that should just be a batched discrete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works perfectly
disc = MultiDiscrete([[2, 3], [4, 5]])
spec = _gym_to_torchrl_spec_transform(disc, categorical_action_encoding=True)
spec.rand()
With categorical_action_encoding=False
it breaks.
We can fall back on MultiOneHot or similar, but the main goal should be that the shape of MultiDiscrete is unchanged after conversion
why would we stack this?
I'm not saying you need this, it's a proposed change. We need this for async envs in gym which is thing that is important to support. I think that using MultiOneHot when we can just have a batched discrete makes little sense, so let's try to see how we can make that happen.
Quick question: what would you expect from MultiDiscrete([[2, 3], [4, 5]])
to torchrl spec?
A MultiOneHot will be hard to work with bc you don't have a consistent last dim (one is 2 + 4
and the other 3 + 5
if i'm not wrong, so you can't stack these together -> you will have a [2, *undifined]
shape).
Would that make sense to restrict this to be non-one hot (ie discrete) in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a comprehensive example after my latest push
from gymnasium.spaces import MultiDiscrete
from torchrl.envs.libs.gym import _gym_to_torchrl_spec_transform
disc = MultiDiscrete([[2, 3], [2, 3]])
print("multi-discrete, gym", disc)
print("nvec shape", disc.nvec.shape)
spec = _gym_to_torchrl_spec_transform(disc, categorical_action_encoding=True)
print("spec, discrete", spec)
print("sample", spec.rand())
print("spec shape", spec.shape)
spec = _gym_to_torchrl_spec_transform(disc, categorical_action_encoding=False)
print("spec, one-hot", spec)
print("sample", spec.rand())
print("spec shape", spec.shape)
disc = MultiDiscrete([[2, 3], [4, 5]])
print("multi-discrete, gym", disc)
print("nvec shape", disc.nvec.shape)
spec = _gym_to_torchrl_spec_transform(disc, categorical_action_encoding=True)
print("spec, discrete", spec)
print("sample", spec.rand())
print("spec shape", spec.shape)
spec = _gym_to_torchrl_spec_transform(disc, categorical_action_encoding=False) # breaks
print("spec, one-hot", spec)
print("sample", spec.rand())
print("spec shape", spec.shape)
the print looks like this
multi-discrete, gym MultiDiscrete([[2 3]
[2 3]])
nvec shape (2, 2)
spec, discrete MultiDiscreteTensorSpec(
shape=torch.Size([2, 2]),
space=BoxList(boxes=[BoxList(boxes=[DiscreteBox(n=2), DiscreteBox(n=2)]), BoxList(boxes=[DiscreteBox(n=3), DiscreteBox(n=3)])]),
device=cpu,
dtype=torch.int64,
domain=discrete)
sample tensor([[0, 0],
[1, 2]])
spec shape torch.Size([2, 2])
spec, one-hot MultiOneHotDiscreteTensorSpec(
shape=torch.Size([2, 5]),
space=BoxList(boxes=[DiscreteBox(n=2), DiscreteBox(n=3)]),
device=cpu,
dtype=torch.int64,
domain=discrete)
sample tensor([[0, 1, 1, 0, 0],
[0, 1, 0, 1, 0]])
spec shape torch.Size([2, 5])
multi-discrete, gym MultiDiscrete([[2 3]
[4 5]])
nvec shape (2, 2)
spec, discrete LazyStackedMultiDiscreteTensorSpec(
shape=torch.Size([2, 2]), device=cpu, dtype=torch.float32, domain=discrete)
sample tensor([[1, 0],
[0, 0]])
spec shape torch.Size([2, 2])
spec, one-hot LazyStackedMultiOneHotDiscreteTensorSpec(
shape=torch.Size([2, -1]), device=cpu, dtype=torch.float32, domain=discrete)
@matteobettini what should we do with the example that breaks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To resolve this issue:
- Can you provide an example of where you expect that things will break and/or the user experience will be negatively impacted. Here is my 2 cents:
- MultiOneHot is a giant hack. Per se, you work with a tensor that is strongly tied to its spec. Without the spec, you cannot know what this tensor represents: the only instance I see where you know how to split it is all cases like
[False, False, True, True, False]
. So because if you're using MultiOneHot you must use the spec wherever you use the tensor, then when you use the simplified version (with OneHot + batch-size) you will also have the spec at hand (because I presume that your code base handles the most complex use case so the simpler one falls into the same requisites). Therefore, anything you do with your tensor can probably be represented with methods from the spec (e.g.spec.encode
,spec.to_numpy
etc) and you very rarely interact directly with the tensor. I do not see anywhere where a control flow would be added likeif isinstance(spec, MultiOneHot): ... ; else ...
. - Along the same line, it isn't easy to work with MultiOneHot and I'm not sure I've ever seen such a data structure anywhere else than torchrl. So from a UX perspective restricting its usage to edge cases isn't a great price to pay.
- It is important also to acknowledge what percentage of users (roughly) will be impacted by which decision. MultiOneHot is to be used by the following subset of users: those who work with more than one agent/env + have action space with heterogeneous action + want to use one-hot encoding (and not discrete). Counting that each of these is roughly 50% of users, deciding that the behaviour changes when agents are homogeneous or heterogeneous impacts at the most 0.5 ** 3 = 0.125 of our user base (and this is assuming that there actually is a negative impact for these people, but I haven't really understood what the negative impact is yet).
- I see the point that a different config will create different classes depending on homogeneity. On the other hand, this is the case in many paradigm (e.g. pytorch dataloaders are from different classes if
num_workers=0
ornum_workers=2
and people are usually not complaining about it much because the way they work with the dataloader isn't impacted).
- MultiOneHot is a giant hack. Per se, you work with a tensor that is strongly tied to its spec. Without the spec, you cannot know what this tensor represents: the only instance I see where you know how to split it is all cases like
The most important thing is what solution we can provide to this problem:
What is the way forward to you? Here are the options I see
- Leave as it. This means that with Gymnasium, vectorized envs will have a
MultiOneHot(num_envs * num_action)
spec which we will have a hard time fitting in the env. - Have a distinct behaviour for gymnasium, gym and other libs like VMAS. This will be a high price to pay in term of UX: we'll need to document that with gym one should expect the conversion to be like X->Y, with VMAS X->Z and with gymnasium X->W. Therefore, we gain the consistency within a domain (eg, MARL) but loose it across libs, which is one feature we want to promote about torchrl I think.
- Use the proposed with the following caveats:
- If a MultiOneHot is created, the user is informed once in the execution that this is the spec they will get. We explicitly say why they should not use one-hot (performance is poorer, performance degrates as the action space increases and stacking heterogeneous creates a structure that is hard to work with)
- In a not so distant future we switch the default to discrete and make batched discrete fall back onto multi-discrete (I think it's already the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current proposed solution is the best in order to be able to handle gymnasium vectorised environments that are using MultiDiscrete
spaces to represent the vectorisation dimension.
We can take this route and take some precautions as discussed, for example:
- Document the spec translation mechanism
- Default to categorical action encoding
Example
My example about a potential surprising behaviour is the following (it is in the single-agent domain, so it is not related to MARL):
spec = gym.spec.MultiDiscrete(n1, n2) # Where n1 is for example the number of movement actions and n2 is the number of communication actions
Now, when this is translated in torchrl, with action encoding set to one-hot, it will give the following specs:
if n1 == n2:
OneHotTensorSpec(shape=(2, n1))
else:
MultiOneHotTensorSpec(shape=(n1+n2))
So simplifying the spec to a OneHot in case the action sizes are the same.
That said, I am on board with the current solution for gymnasium spec translation as in the above example we do not have the tools to understand if we are seeing a single environment with multiple discrete actions or a vectorized environment with a single discrete action.
So we can go ahead with the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool I'll check on some edge cases and ping you with these!
torchrl/envs/libs/gym.py
Outdated
# dtype = ( | ||
# numpy_to_torch_dtype_dict[spec.dtype] | ||
# if categorical_action_encoding | ||
# else torch.long | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# dtype = ( | |
# numpy_to_torch_dtype_dict[spec.dtype] | |
# if categorical_action_encoding | |
# else torch.long | |
# ) | |
# dtype = ( | |
# numpy_to_torch_dtype_dict[spec.dtype] | |
# if categorical_action_encoding | |
# else torch.long | |
# ) |
return torch.stack( | ||
[ | ||
_gym_to_torchrl_spec_transform( | ||
spec[i], | ||
device=device, | ||
categorical_action_encoding=categorical_action_encoding, | ||
remap_state_to_observation=remap_state_to_observation, | ||
) | ||
for i in range(len(spec.nvec)) | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matteobettini wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, is it what we decided in our very long thread.
I think this makes sense to me
torchrl/envs/libs/gym.py
Outdated
return super()._build_env(env, pixels_only=pixels_only, from_pixels=from_pixels) | ||
env = super()._build_env(env, pixels_only=pixels_only, from_pixels=from_pixels) | ||
if num_envs > 0: | ||
return self._async_env([CloudpickleWrapper(lambda: env)] * num_envs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit hacky but we need that to make it work with wrappers like Pixels stuff
""" | ||
return done, done | ||
return done, done.any() if not isinstance(done, bool) else done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't particularily like it
One aspect to consider is when we're doing our skip_frame but one of the envs is done and not the others. Then we stop the loop. This means that a bunch of envs silently don't complete the loop. Not sure what to do bc we can't really keep on applying the same action to the other env if it's in auto-reset.
One solution is that if a non-complete done is encountered we raise a warning telling users to use the skip-frame transform instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the situation you described is too common and silently not completing the loop is dangerous.
A bit hacky but cant we create some kind on NO-OP step wrapper (e.g. if action is -1, just returns the last obs, reward etc, otherwise normal step)? and overwrite the actions of the envs that have reached the end of the episode before skip_frame actions are taken.
You do the loop skip_frame times for all envs but for early finishing envs a useless step will be called at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about disabling frame_skip for num_envs > 0, and just say that one needs to use the transform instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I always thought frame_skip as a transform was more consistent with the rest of the library. I thought there was some problem that made it less efficient or sth and that was why the option was also given in the env. But yes, if that is not the case, I think using the transform is the best way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vmoens quick heads up here. I think the frame_skip transform did not handle truncation/termination correctly the last time I checked. Also, some envs natively handle frame_skip and this runs much faster as they don't have to render the frames, e.g. ALE-v5 which skips the frames internally with the frameskip
keyword in the constructor.
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 79.2122ms | 78.7944ms | 12.6913 Ops/s | 12.8995 Ops/s | |
test_sync | 0.1224s | 45.3155ms | 22.0675 Ops/s | 22.2528 Ops/s | |
test_async | 80.4515ms | 40.5331ms | 24.6712 Ops/s | 23.8684 Ops/s | |
test_simple | 0.7177s | 0.6508s | 1.5365 Ops/s | 1.5716 Ops/s | |
test_transformed | 0.9062s | 0.8453s | 1.1830 Ops/s | 1.1968 Ops/s | |
test_serial | 1.8966s | 1.8407s | 0.5433 Ops/s | 0.5507 Ops/s | |
test_parallel | 1.7739s | 1.5540s | 0.6435 Ops/s | 0.6681 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.2021ms | 44.8609μs | 22.2911 KOps/s | 22.5906 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 47.6010μs | 25.2501μs | 39.6038 KOps/s | 39.8588 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 0.1171ms | 31.6690μs | 31.5766 KOps/s | 32.0356 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 45.5010μs | 17.5205μs | 57.0760 KOps/s | 56.8480 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 0.1290ms | 46.1845μs | 21.6523 KOps/s | 21.7101 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 0.1217ms | 27.0362μs | 36.9875 KOps/s | 37.2698 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 0.1125ms | 33.2361μs | 30.0878 KOps/s | 30.1675 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 53.4010μs | 19.7912μs | 50.5275 KOps/s | 51.4049 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 0.1120ms | 48.0232μs | 20.8233 KOps/s | 21.0208 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 0.1056ms | 29.3442μs | 34.0783 KOps/s | 34.9529 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 72.9010μs | 33.4896μs | 29.8600 KOps/s | 30.3742 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 95.8020μs | 19.5477μs | 51.1569 KOps/s | 51.7583 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 79.6020μs | 49.5852μs | 20.1673 KOps/s | 20.1785 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 0.1047ms | 30.8171μs | 32.4496 KOps/s | 32.6173 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 0.2065ms | 34.8447μs | 28.6988 KOps/s | 28.9441 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 45.3010μs | 21.0985μs | 47.3966 KOps/s | 46.9845 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 0.1388ms | 48.0814μs | 20.7981 KOps/s | 20.9567 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 61.2010μs | 28.8592μs | 34.6510 KOps/s | 34.1684 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 0.1072ms | 37.3855μs | 26.7483 KOps/s | 27.1318 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 3.5385ms | 21.6908μs | 46.1025 KOps/s | 46.0770 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 2.0713ms | 50.3220μs | 19.8720 KOps/s | 20.3033 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 0.1121ms | 30.5874μs | 32.6932 KOps/s | 32.8470 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 84.8020μs | 39.2792μs | 25.4588 KOps/s | 26.0762 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 0.3859ms | 23.6587μs | 42.2678 KOps/s | 43.3858 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 0.4875ms | 52.3478μs | 19.1030 KOps/s | 19.6652 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 58.4010μs | 32.4342μs | 30.8316 KOps/s | 31.0871 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 0.1262ms | 39.5594μs | 25.2785 KOps/s | 25.4728 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 53.5010μs | 23.1426μs | 43.2104 KOps/s | 43.5991 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 0.1241ms | 52.4501μs | 19.0657 KOps/s | 19.0881 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 0.1181ms | 33.8247μs | 29.5642 KOps/s | 29.1140 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 61.8010μs | 40.0152μs | 24.9905 KOps/s | 25.1451 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 0.1081ms | 24.8976μs | 40.1646 KOps/s | 41.1439 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 17.5846ms | 13.2842ms | 75.2775 Ops/s | 74.3249 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 50.0162ms | 41.3913ms | 24.1597 Ops/s | 23.7635 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.3603ms | 0.1993ms | 5.0175 KOps/s | 4.7488 KOps/s | |
test_values[td1_return_estimate-False-False] | 13.1577ms | 12.9413ms | 77.2718 Ops/s | 77.4402 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 55.4137ms | 42.2792ms | 23.6523 Ops/s | 24.1341 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 31.9233ms | 31.4179ms | 31.8290 Ops/s | 32.2944 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 49.1299ms | 41.6726ms | 23.9966 Ops/s | 23.9107 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 14.6946ms | 11.7893ms | 84.8224 Ops/s | 86.4146 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 6.2168ms | 3.4907ms | 286.4730 Ops/s | 286.4033 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 2.4942ms | 0.4751ms | 2.1048 KOps/s | 2.0675 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 63.8304ms | 56.4105ms | 17.7272 Ops/s | 18.1532 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 9.5641ms | 2.8455ms | 351.4349 Ops/s | 350.3652 Ops/s | |
test_dqn_speed | 8.9273ms | 1.8372ms | 544.3052 Ops/s | 531.8064 Ops/s | |
test_ddpg_speed | 11.4405ms | 2.7881ms | 358.6670 Ops/s | 208.2320 Ops/s | |
test_sac_speed | 14.4548ms | 8.1515ms | 122.6765 Ops/s | 119.8423 Ops/s | |
test_redq_speed | 20.2526ms | 15.9463ms | 62.7106 Ops/s | 59.2171 Ops/s | |
test_redq_deprec_speed | 20.1568ms | 12.8555ms | 77.7877 Ops/s | 76.4326 Ops/s | |
test_td3_speed | 10.9874ms | 10.1536ms | 98.4872 Ops/s | 95.0925 Ops/s | |
test_cql_speed | 32.2043ms | 26.0500ms | 38.3878 Ops/s | 36.8682 Ops/s | |
test_a2c_speed | 14.6133ms | 5.2253ms | 191.3762 Ops/s | 182.8786 Ops/s | |
test_ppo_speed | 13.0165ms | 5.6020ms | 178.5087 Ops/s | 165.6281 Ops/s | |
test_reinforce_speed | 12.4208ms | 4.0086ms | 249.4608 Ops/s | 210.2432 Ops/s | |
test_iql_speed | 30.0573ms | 20.8162ms | 48.0394 Ops/s | 40.7590 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.6104ms | 2.6075ms | 383.5036 Ops/s | 358.0251 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 4.6348ms | 2.7908ms | 358.3201 Ops/s | 342.3177 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.1771s | 3.3175ms | 301.4292 Ops/s | 342.4683 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.0559ms | 2.6388ms | 378.9532 Ops/s | 367.8976 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 4.8975ms | 2.8369ms | 352.5020 Ops/s | 349.1930 Ops/s | |
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 6.0387ms | 2.8865ms | 346.4363 Ops/s | 345.1573 Ops/s | |
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.6290ms | 2.6721ms | 374.2390 Ops/s | 358.3472 Ops/s | |
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 6.6049ms | 2.8923ms | 345.7470 Ops/s | 298.2064 Ops/s | |
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 4.4738ms | 2.8987ms | 344.9788 Ops/s | 343.3221 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.5115ms | 2.7047ms | 369.7229 Ops/s | 291.8763 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 5.1296ms | 2.8464ms | 351.3205 Ops/s | 341.7028 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 5.2037ms | 2.8343ms | 352.8226 Ops/s | 346.1869 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.0206ms | 2.6257ms | 380.8461 Ops/s | 364.6113 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 4.7717ms | 2.8511ms | 350.7445 Ops/s | 342.5463 Ops/s | |
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 4.7224ms | 2.8233ms | 354.1965 Ops/s | 338.5478 Ops/s | |
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.3200ms | 2.6074ms | 383.5211 Ops/s | 354.3694 Ops/s | |
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 4.8342ms | 2.7928ms | 358.0600 Ops/s | 345.1946 Ops/s | |
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 5.0015ms | 2.7815ms | 359.5220 Ops/s | 339.1715 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.2447s | 31.4848ms | 31.7614 Ops/s | 31.0012 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 0.1436s | 26.8443ms | 37.2518 Ops/s | 36.5031 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 0.1469s | 29.4377ms | 33.9700 Ops/s | 33.4621 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1449s | 27.0529ms | 36.9646 Ops/s | 33.0623 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 0.1468s | 29.4381ms | 33.9695 Ops/s | 36.1432 Ops/s | |
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 0.1486s | 26.8233ms | 37.2810 Ops/s | 32.8302 Ops/s | |
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1418s | 29.1127ms | 34.3493 Ops/s | 35.6897 Ops/s | |
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 0.1435s | 26.8081ms | 37.3022 Ops/s | 33.0491 Ops/s | |
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 0.1428s | 29.2944ms | 34.1362 Ops/s | 36.9227 Ops/s |
On my mcbook, I get to 14-16K fps with 4 async collectors x 8 envs/collector With a sync collector i reach 20k fps. Compared with just an async env of 32 workers, which plateau at 12K fps. |
Without that how would you be able to do a reset? |
|
@matteobettini |
I have now read the docs at https://gymnasium.farama.org/api/vector/. Given this, we might need to be careful about a few things: rl/torchrl/collectors/collectors.py Line 828 in 147de71
are passing the internal _reset flag that contains the sub_envs to reset. It seems to me that in the wrapper for gym vector envs we should call env.reset() only if all the _reset dimensions are set. This is because it seems that all the other cases for partial resets are handled internally.
Does it make sense? I might be missing some gymnasium knowledge but this is what i gathered from the docs |
This isn't great IMO: does it mean that we must tell collectors to ignore done states then? I was hoping to deprecate that feature in the next release haha. EDIT: It's actually even worse than that. Say one user wants to use a data collector, with or without async gym env. She needs to dig into both classes to understand that the auto-reset of the collector conflicts with the one of the env. So many silent bugs can arise from this! It isn't really about torchrl but about any loop like for _ in range(N):
obs, ... done, ... = env.step()
if any(done):
env.reset(done) That seems quite generic to me. The fact that an env behaves differently when executed in parallel or not is unexpected and weird IMO. But I might be missing something. Curious about your thoughts @pseudo-rnd-thoughts For the GymWrapper, regarding this line I think we can just do an assertion of whether or not all reset are true or all false, and if it isn't the case we raise an exception saying that it isn't supported (ie, cannot partially reset a vectorized env). What about frame-skip? Because we can't control the reset, and we can't avoid doing steps in some sub-envs, that prevents any implementation of frame-skip at our level, right? I think both the internal frame-skip and the transform will not be able to handle that... @albertbou92 want to chime in and share your thoughts? |
One possible solution is to have (yet another!) attribute within the env like |
Another issue is that the auto-reset screws up the next observation of the last step of the env, which can be found in |
The fact that gymnasium vector environments auto-reset makes frame-skipping behave differently but IMO It will not change training behavior that much. It will apply the same action N time with or without a reset in between, as opposed to the normal behavior of finishing the loop if end of episode is reached. However, I agree it seems to complicate the code here and there and messes up with with the current code. Ideally I would think about implementing our own version and allowing for manual reset of subsets of environments, but maybe it is overkill. |
This code should now work through c798307 import torch
from torchrl.envs.libs.gym import GymEnv
if __name__ == "__main__":
env = GymEnv("CartPole-v1", num_envs=4)
print("env", env)
r = env.rollout(100, break_when_any_done=False)
print(r)
done = r[:, :-1]["next", "done"].squeeze(-1)
assert done.any()
# data resulting from step, when it's not done
r_not_done = r[:, :-1]["next"][~done]
# data resulting from step, when it's not done, after step_mdp
r_not_done_tp1 = r[:, 1:][~done]
torch.testing.assert_close(
r_not_done["observation"],
r_not_done_tp1["observation"]
)
# data resulting from step, when it's done
r_done = r[:, :-1]["next"][done]
# data resulting from step, when it's done, after step_mdp and reset
r_done_tp1 = r[:, 1:][done]
assert ((r_done["observation"] - r_done_tp1["observation"]).norm(
dim=-1
) > 1e-1).all(), (
r_done["observation"] - r_done_tp1["observation"]).norm(dim=-1) making the behaviour consistent with other envs |
Sorry, I'm a bit confused by your comment Currently, Gym and Gymnasium async and sync vector implementation have autoreset enabled with no way of disabling it. For the frameskipping, to me, I don't think doing this at the vector level is a good idea but rather it should be done at the sub-environment level. You should be able to apply wrappers to any of the sub-environments used in our async and sync vector environment. Now for the interesting news, in v1.0.0 (soon to be released), we have separated the I find vector environment complex to work out what is happening and I'm happy to talk about this on discord otherwise I go up to London fairly regularly if it is easier to talk in person. |
Thanks for this @pseudo-rnd-thoughts that's very useful. I've made the workaround in this PR and now torchrl sort of "ignores" the auto-reset by caching the data resulting from the reset and delivering the data from the info. When calling reset with the appropriate mask, you will get the cached data. I am probably missing some important things in the API design of gymnasium and I'd be happy to know more and understand why the vectorized envs are written that way. Happy to meet in person when you're in London! |
Thanks for the fast reply, I will send you an email when I'm next travelling to London which might next longer than expected (mid November). I understand what you intend with For the For the observation shape, to optimize observations for neural networks, we batch (concatenate) observations, i.e., to batch dict observation, we provide a dict of batch sub-spaces. See https://gymnasium.farama.org/api/spaces/vector_utils/#gymnasium.vector.utils.batch_space for this documentation about this. I can add more examples to help. I'm what is the function type of |
Thanks for the suggestion. Have a look at this
Why not an observation that matched the one found in the first element of the tuple returned by step, with 0 value for all obs that are not final? What puzzles me is that the data is presented in 2 different ways. But the root "problem" is the choice of auto-resetting the env IMO so any patch to that is going to be unsatisfactory for some users. I personally would have expected for
GymEnv.step always returns a tensordict. All data is presented as pytorch |
@pseudo-rnd-thoughts I guess the hope of users such as me for vec-env is that it behaves like pytorch dataloaders. The use case is different, as dataloaders will deliver the exact same datastructure than vec envs (ie you won't get 16 batches at a time if you ask for 16 workers) but sill, the fact that speeding up my code minimanlly disrupts how I code is an amazing feature. It means I can just ask for more resources and test things locally without being surprised at any point. I'm afraid that the choice of creating 2 different classes for single and vec envs in gymnasium 1.0 goes even further away from the interchangeability we're hoping for. Simply look at the amount of effort it takes for us to cover single and mp envs within a single class in TorchRL. Having 2 different env classes for these will be even more difficult. At the age of jax and functorch, why do we still need to differentiate single and multiprocessed envs? Why can't we vectorize all ops? Like you were saying with envpool, if we were relying on modern solutions we could just tell an env what its batch size is and execute ops in a vectorized way and call it a day. This is the direction that many people want to see (take Isaac or VMAS for instance). |
Good questions particularly about scaling to vmapping over stuff and the problem of scaling code. I think the primary answer would be backward compatibility and this is the way that gym was designed, back in the today when people were just trying to get RL to work with a single environment. But I agree that in the modern age (5 years later), Gym is starting to show its limitations. One of our aims was to separate out Otherwise designing your algorithm to use vector environment to begin with, then I don't think it is easy to scale in the way that you are discussing but I certainly agree that this is an important area of research |
# Conflicts: # test/test_libs.py # torchrl/envs/libs/gym.py
# Conflicts: # docs/source/reference/envs.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM few comments
Merging with the following failing tests:
|
Awesome!🚀🚀 Any pointers to how to use this feature? |
cc @matteobettini @albertbou92