Fix optimizer load state dict copy tensor by reference #1173

shaltielshmid · 2023-12-06T16:32:11Z

shaltielshmid · 2023-12-06T17:33:43Z

Also fixes #1176 with the conditional state being copied to the right device.

NiklasGustafsson · 2023-12-06T20:40:27Z

src/TorchSharp/Optimizers/SGD.cs

                public override void LoadStateDict(BinaryReader reader)
                {
                    LoadConditionalStateTensor(reader, ref momentum_buffer);
+                    momentum_buffer = momentum_buffer.to(_parameter.device); 


No need to dispose the old momentum buffer if it's actually moved?

Good catch. Will fix.

Question: Do you want me to write something like:

if (momentum_buffer.device_type != _parameter.device_type || momentum_buffer.device_index != _parameter.device_index) { using var copy = result; result = copy.to(_parameter.device); }

(That's not handled in the regular Optimizer.to(Device device) function. See for example here)

I think it's sufficient to check the Handle before and after the call to `to'. I think.

Haha
In that case, should I go with my proposed solution?

Also, does that mean we should update all the Optimizer.to(...) functions?

Maybe we should have a parameter dispose_if_moved to the to function to make it generic?

It's handled in the Module variants of to(). Follow that pattern.

Maybe it's worth writing an internal utility function that has the 'move or not move' predicate, so it can be reused.

I checked the Module variants of to(), and there is no dispose going on there, but there they check the device to see if it changed. I'll follow that pattern.

What do we want to do about all the cases where the .to() function is called but no dispose being handled, like here or here etc.

src/TorchSharp/Optimizers/SGD.cs

src/TorchSharp/Tensor/Tensor.cs

shaltielshmid · 2023-12-06T22:38:01Z

@NiklasGustafsson What do you think of my solution?
I added a disposeAfter flag to the to(..) function which dispose the input tensor after moving to the new tensor.
Turns out we didn't need to check if the tensor actually moved, because we get a new handle regardless.

I wrote a unit test attempting to move and cast tensors to the same and different types/devices, and made sure the behavior was as expected.

What do you think?

If this solution is good, I'll wait for this to be merged and then use it in the other PR.

NiklasGustafsson · 2023-12-06T22:47:11Z

@shaltielshmid -- here's a tip: Once you have created a PR, try to consolidate a number of commits before pushing again. Every push starts a new build on the CI/CD pipeline, which wastes resources.

shaltielshmid · 2023-12-06T22:55:57Z

Of course, I apologize.
I generally try to make a few commits, but only push when I complete changes.

NiklasGustafsson · 2023-12-07T04:13:30Z

src/TorchSharp/Optimizers/Adadelta.cs

                    step = st_state.step;
-                    square_avg = st_state.square_avg;
-                    acc_delta = st_state.acc_delta;
+                    square_avg = st_state.square_avg.to(_parameter.device, copy: true);


Why do we have to set copy to true here?

If the State Dictionary is actually from another existing optimizer, then the tensors are copied by reference and we don't want two different optimizers sharing a state tensor.

For example:

var lin1 = torch.nn.Linear(10, 10); var optim1 = torch.optim.Adam(lin1.parameters(), 0.05f); var lin2 = torch.nn.Linear(10, 10); var optim2 = torch.optim.Adam(lin2.parameters()); optim2.load_state_dict(optim1.state_dict())

In that case, shouldn't the 'copy' argument be passed down from where you know whether it's necessary or not?

I can add it in, if you think it's the right way to it.

The way I see it, the StateDictionary is an object which is created only by calling Optimizer.state_dict(), and the source Optimizer is the one which manages the tensors that are in the dictionary. If they are to be disposed, then the Optimizer would handle them. And therefore, the new Optimizer should receive a fresh copy of the tensors to manage itself.

Meaning, the StateDictionary is just an wrapper interface for accessing the values of an Optimizer.
I think we don't ever want two Optimizers to share the same tensor handle.

That being said, if you disagree and would like me to add a copy parameter to the load_state_dict function, no problem.

Got it. I think that's probably right. If we later conceive of a case where you do want to share state, we can add an argument later and set the default to 'true'

NiklasGustafsson · 2023-12-07T16:33:01Z

You have run the unit tests on CUDA, right?

shaltielshmid · 2023-12-07T16:44:44Z

Yup

NiklasGustafsson · 2023-12-07T16:46:02Z

Merged

shaltielshmid added 7 commits December 6, 2023 18:12

Added clone when copying tensors in optimizers

fcd6a4d

Added test

0eb7a56

Added release notes

55734db

Updated load conditional state to copy the tensor to the right device

38b023a

Added copy optional parameter to Tensor.to(Device device)

1d811a5

Updated clone to also to move to device

a3f812d

Added test for part 2

cf12ff4

shaltielshmid added 2 commits December 6, 2023 19:36

Updated move conditional adam tensor

b2c7324

Updated release notes

ea2e6c7

NiklasGustafsson reviewed Dec 6, 2023

View reviewed changes

shaltielshmid added 5 commits December 6, 2023 22:57

Added copy parameter

d90adb7

Moved conditional state tensor to device

d8e64e4

Added dispose on copy

bfc7a01

Disposed temporarily loaded tensor before moving

1607287

Added dispose after tests

bbc569e

shaltielshmid force-pushed the optimizer-load-clone-tensors branch from d1b68ad to bbc569e Compare December 6, 2023 22:36

NiklasGustafsson reviewed Dec 7, 2023

View reviewed changes

NiklasGustafsson merged commit c7aab8c into dotnet:main Dec 7, 2023

shaltielshmid deleted the optimizer-load-clone-tensors branch December 7, 2023 16:46

shaltielshmid mentioned this pull request Dec 7, 2023

Conditional load tensors in Optimizers aren't loaded onto the correct device #1176

Closed

Fix optimizer load state dict copy tensor by reference #1173

Fix optimizer load state dict copy tensor by reference #1173

Uh oh!

Conversation

shaltielshmid commented Dec 6, 2023

Uh oh!

shaltielshmid commented Dec 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shaltielshmid Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shaltielshmid commented Dec 6, 2023

Uh oh!

NiklasGustafsson commented Dec 6, 2023

Uh oh!

shaltielshmid commented Dec 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shaltielshmid Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shaltielshmid Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NiklasGustafsson commented Dec 7, 2023

Uh oh!

shaltielshmid commented Dec 7, 2023

Uh oh!

NiklasGustafsson commented Dec 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaltielshmid Dec 6, 2023 •

edited

Loading

shaltielshmid Dec 7, 2023 •

edited

Loading

shaltielshmid Dec 7, 2023 •

edited

Loading