[BugFix] [Feature] "_reset" flag for env reset #800

matteobettini · 2023-01-06T22:20:38Z

Description

This PR ddresses issue #790.

The changes replace the "reset_workers" flag (only deigned for ParallelEnvs wrapping environments with emty batch_size) with the "_reset" flag, which spans over all batch_size dimensions.

This allows to more precisely tell the wrapped environemnts which dimensions need to be reset.

In accordace to this, now the reset() methods on EnvBase and ParallelEnv only check that at least the indexes that were flagged to be reset are not done. Instead of checking assert not done.any().

matteobettini · 2023-01-06T22:27:45Z

@vmoens
I will write a few tests for the non-regression of this feature.

In the meantime, if you could have a quick look at the changes I have made to collectors.py since I am not super familiar with that part of the codebase yet. I just want to make sure I didn't miss any subtleties and the logic remains unchanged.

vmoens

Minor comment but otherwise LGTM

vmoens · 2023-01-07T14:37:27Z

torchrl/envs/common.py

+        if tensordict is not None and "_reset" in tensordict.keys():
+            self._assert_tensordict_shape(tensordict)
+            _reset = tensordict.get("_reset")
+        else:
+            _reset = None
+
+        if (_reset is None and tensordict_reset.get("done").any()) or (
+            _reset is not None and tensordict_reset.get("done")[_reset].any()
+        ):


What about

if tensordict is not None: self._assert_tensordict_shape(tensordict) _reset = tensordict.get("_reset", None) if (_reset is None and tensordict_reset.get("done").any()) or ( _reset is not None and tensordict_reset.get("done")[_reset].any() ):

_reset = tensordict.get("_reset", None)
this crashes when tensordict is None

vmoens

I did not see but the CI seems broken, let me see what happened

…by the envs

matteobettini · 2023-01-08T07:45:30Z

The current version of the changes adopts the logic of deleting the _reset flag right after it has been used. EnvBase, ParallelEnvand SerialEnv will delete the "_reset" flag, if present, right after resetting. This implies that external components like collectors just have to worry about setting the flag and not deleting it

matteobettini · 2023-01-08T09:38:15Z

I added some tests, CI seems broken on some http stuff

matteobettini · 2023-01-08T09:41:37Z

While finishing up this PR I stumbled upon an unexpecte behavior that I am not sure is intended.

Imagine having an EnvBase where the reset function returns "done" and "observation".

You wrap it in a ParallelEnv with 2 workers and first things first you call reset, asking to reset only the first worker. What you will get is observations for both workers, but the ones for the worker which was not reset is random, since that worker did not even call its reset methos (as intended).

Is this expected?

vmoens · 2023-01-08T10:29:10Z

While finishing up this PR I stumbled upon an unexpecte behavior that I am not sure is intended.

Imagine having an EnvBase where the reset function returns "done" and "observation".

You wrap it in a ParallelEnv with 2 workers and first things first you call reset, asking to reset only the first worker. What you will get is observations for both workers, but the ones for the worker which was not reset is random, since that worker did not even call its reset methos (as intended).

Is this expected?

Good point, but what would be the expected behaviour?

matteobettini · 2023-01-08T14:18:34Z

Good point, but what would be the expected behaviour?

The rule I use in vmas is to reset only the ones that need resetting and call env.observation(agent) for all envs and all agents. Given that here such an observatio function is not available in all libs I think we have a few options:

Option 1: call `reset()` for all workers indeendently

This calls reset on all the workers and passes the "_reset" flag. This flag would be all Falses for the workers which didn't need resetting. The envs then have to look at the "_reset" flag and, if the flag is all False, only return the observation without resetting.

Cons

Every envioronment in torchrl has to introduce the logic to handle the _reset flag and this behavior. Might be asking too much

Option 2: Heterogenous tensordicts

Only return the observation and keys that are actually available. If some workers were not reset, their observation dimension will be 0 and the return tensordict will be heterogeneous.

Option 3: Use NaNs or other placeholders.

Instead of putting random values for workers which were not reset, use a convetion of values (nans, inf, zeros).

Option 4: leave as is

When reset() is called after a step() the value of observations for workers which were not reset is the last one available from the step. The only case where the behavior is undefined is when reset is called before step only for some workers. We could precisely define what happens in this case in the docs and say that one should only access the values of

_reset = torch.randint(0,2, size=(*parallel_env.batch_size,1), dtype=troch.bool)
reset_td = parallel_env.reset(TensorDict({"_reset":_reset}, batch_size=parallel_env.batch_size))
reset_td[_reset] # Only values a user should access

My opinion

In my opinion 3 and 4 are still the best and least disruptive. They are synonyms of padding, which I hate, but we live in a world were NestedTensors are not yet available. The users have to be smart though and careful to know which values will be meaningless

@vmoens whats your take?

vmoens · 2023-01-08T21:20:12Z

I agree with you that options 3 and 4 look better than others.
In general, while I see the issue, I would prefer not to adopt a solution that forces users that do not use parallel envs / multi-agent to reset envs in a way that is more complex than needed.

To build the fake_tensordict that serves as buffer for the parallel env, we use rand with no reason. We could perfectly use NaN, zeros or anything else.

Also, with the collectors we return a "mask" entry that represents what indices of the data is valid. It's a way to make sure that users have access to what was or wasn't the result of a reset. Would you consider something like that?

matteobettini · 2023-01-08T21:41:24Z

That makes sense yes, we can leave as is. Does the mask in the collectors already take into account this problems when resetting parallel workers?

vmoens · 2023-01-09T07:53:45Z

That makes sense yes, we can leave as is. Does the mask in the collectors already take into account this problems when resetting parallel workers?

No it just keeps track of padding operations. I was mentioning this as a mechanism to track valid steps.

Another thing to consider is this: if your env supports sub-envs that have not been reset but exist (e.g. in some lib you must reset before doing anything else), you could just append a transform that does some step counting. If an env has been reset, the steps will be strictly greater (or greater or equal?) than 0. If a step is 0 (or -1?), no step has actually been done.

See this class

e.g.

base_env = MyEnv(..., n_envs=2)
env = TransformedEnv(base_env, StepCounter())
tensordict = env.reset(TensorDict({"_reset": [True, False]}, [2]))
print(tensordict["steps"])  # prints [0, -1]

@riiswa I don't think this behaviour is currently supported, but do you think it would make sense?

riiswa · 2023-01-09T09:44:12Z

@riiswa I don't think this behaviour is currently supported, but do you think it would make sense?

Yes I think that make sense. It will be necessary to write the doc of the class well. Can you create an issue for this and assign me ?

vmoens · 2023-01-09T11:56:12Z

@matteobettini do we want to wait for the discussion mentioned here above to be resolved before merging or are we happy with the current state?

matteobettini · 2023-01-09T12:17:05Z

I think this PR can be merged as standalone since it is mainly refactoring "reset_workers" to "_reset" and putting in place some mechanism to handle it. If users want to use "_reset", it will only work in envs that support it.

The only libs with sub_envs to my knowledge are vmas and brax. vmas supports it. Brax, not having the state during a call to the reset function, cannot return a state which is partially reset and partially the old one. To enable support for brax we would need to pass the state during reset. or leave as is and users will do

td = brax.reset()
for _  in range(n_rollout_samples)
    td = brax.step(td_with_action)
    _reset = td["done"]
    td["state"][_reset] = brax.reset()["state"][_reset]

vmoens

LGTM a few minor issues to solve and we're good to go

vmoens · 2023-01-10T11:51:47Z

test/test_env.py

+            TensorDict({"_reset": _reset}, batch_size=env.batch_size, device=env.device)
+        )
+        env.close()
+        if _reset.any():


How can we make sure that this is reached (I agree there's a low prob that it isn't)
a few idead:

put a else: RuntimeError in the end

same as above + repeat the test X times (e.g. X=3) if failed with a different seed

Good point. Does it make sense now?

test/test_env.py

vmoens · 2023-01-10T11:53:18Z

test/test_env.py

+        assert (td["next"]["observation"] == max_steps + 1).all()
+
+        _reset = torch.randint(
+            low=0, high=2, size=(*env.batch_size, 1), dtype=torch.bool


why is the last dim 1? With the latest tensordict version, the dims of a tensor can match the batch size of a tensordict.

I used the convention of the done flag which has shape (*batch_size,1). I adopted this convetion for _reset also in all the PR files.

See here for example https://github.com/pytorch/rl/blob/main/torchrl/envs/common.py#L454

Or the one me and you did recently https://github.com/pytorch/rl/blob/main/torchrl/envs/vec_env.py#L1013

For "done", we want the last dim to be 1 because we do this kind of thing:

value = reward + value * gamma * (1-done)

we want reward and done to have a last dim of 1 otherwise they will be casted to the size of value which is usually the output of a neural net like nn.Sequential(..., nn.Linear(..., 1)).
So either we squeeze value or we unsqueeze reward and done.
Squeezing requires more brain power from the users IMO, whereas unsqueezing can be hidden from them.

For "_reset" I don't think we need that

For now I have removed the last dim of 1 also from "step_count"

IMO StepCounter should have steps that match the tensordict batch size.
@riiswa

ok so like it is now in the pr, i agree

In the future it might be worth considering forcing one of the two convenctions for all tensordicts imo. I.e. all keys of shape (batch_size, 1) have to be squeezed. or the opposite. I understand that for reward and done one way might be more comfortable, but I find it confusing that some keys and specs are squeezed and others are not

yep it's been under my radar for quite some time. It's a conflicting thing:
Before tensordict was designed such that you could not store tensors with a shape identical to the tensordict (you had to unsqueeze or it was done for your).
That was surprising to many users, so we dropped that.
But as mentioned above, for reward and done, they interact so much with neural nets that it's pain to work with batched data that does not end with a singleton dimension.
Maybe unsqueezing in the env is not the solution though, I don't know really. We could delegate that to the value functions and such. My main worry is that it may lead to silent errors, e.g.

done = torch.zeros(10) next_value = torch.ones(10, 1) value = done*next_value # silently gives the wrong 10 x 10 tensor

it's not only a problem for us but also for the users. That's why I'm not sure we're doing them a favour by removing the unsqueeze.

But I'm more than happy to keep the conversation going.

@tcbegley was involved in some of this refactoring, he may have a 2 cents to share

torchrl/collectors/collectors.py

vmoens

Great work
I love that the library becomes more and more compatible with multiagent!

matteobettini added 7 commits January 6, 2023 19:01

Changed flag in EnvBase

0e38ed4

Changed flag in Vec Envs

7ef1b47

Fixed collectors

4cf67cb

Fixed transforms

197c354

Fixed env tests and lint

9d191e1

Fixed env tests and lint

5320414

Linting

2cf04f9

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 6, 2023

matteobettini mentioned this pull request Jan 6, 2023

[Feature Request] Vectorized/multi-agent environments compatibilitiy issues #777

Closed

matteobettini added 3 commits January 7, 2023 00:01

Added "_reset" kay deletion after use

155efd6

Linting

01a7377

Added "_reset" kay deletion after use

23a82fc

vmoens approved these changes Jan 7, 2023

View reviewed changes

vmoens reviewed Jan 7, 2023

View reviewed changes

matteobettini force-pushed the reset_flag branch from 3297ad5 to 23a82fc Compare January 8, 2023 07:37

removed deletion of reset flag from collector as the flag is deleted …

f8e3d33

…by the envs

matteobettini added 4 commits January 8, 2023 09:07

Moved _reset check before calling wrapped env

2c4eb10

refactor how to check if flag is present

87228e4

added tests

34317c0

Linting

9b3d4e7

vmas support for _reset flag

a721446

matteobettini added 3 commits January 8, 2023 15:33

Merge branch 'main' into reset_flag

e615a52

refactor

ac34b08

close ParallelEnv

469b6d5

vmoens added the Refactoring Refactoring of an existing feature label Jan 9, 2023

partial reset mirrored in step_count

416b066

vmoens approved these changes Jan 10, 2023

View reviewed changes

matteobettini added 7 commits January 10, 2023 13:35

Added torch seeding

9d5754e

Modified tests

e1dbaff

Removed last dim of 1

d9dc888

removed lid dim of 1 to "step_count"

a244c95

removed another 1

5daadee

change spec of StepCount

00c351b

set the done properly in StepCount

6f99133

vmoens approved these changes Jan 11, 2023

View reviewed changes

vmoens merged commit b845cf2 into pytorch:main Jan 11, 2023

matteobettini deleted the reset_flag branch January 11, 2023 09:16

matteobettini mentioned this pull request Jan 17, 2023

[BugFix] Improve done checking of collectors #838

Merged

[BugFix] [Feature] "_reset" flag for env reset #800

[BugFix] [Feature] "_reset" flag for env reset #800

Uh oh!

Conversation

matteobettini commented Jan 6, 2023

Description

Uh oh!

matteobettini commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matteobettini Jan 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

matteobettini commented Jan 8, 2023

Uh oh!

matteobettini commented Jan 8, 2023

Uh oh!

matteobettini commented Jan 8, 2023

Uh oh!

vmoens commented Jan 8, 2023

Uh oh!

matteobettini commented Jan 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Option 1: call reset() for all workers indeendently

Cons

Option 2: Heterogenous tensordicts

Option 3: Use NaNs or other placeholders.

Option 4: leave as is

My opinion

Uh oh!

vmoens commented Jan 8, 2023

Uh oh!

matteobettini commented Jan 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens commented Jan 9, 2023

Uh oh!

riiswa commented Jan 9, 2023

Uh oh!

vmoens commented Jan 9, 2023

Uh oh!

matteobettini commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matteobettini Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matteobettini Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

matteobettini commented Jan 6, 2023 •

edited

Loading

matteobettini Jan 8, 2023 •

edited

Loading

matteobettini commented Jan 8, 2023 •

edited

Loading

Option 1: call `reset()` for all workers indeendently

matteobettini commented Jan 8, 2023 •

edited

Loading

matteobettini commented Jan 9, 2023 •

edited

Loading

matteobettini Jan 10, 2023 •

edited

Loading

matteobettini Jan 10, 2023 •

edited

Loading