Skip to content

Conversation

@shaltielshmid
Copy link
Contributor

Following the discussion on #1126.

@shaltielshmid
Copy link
Contributor Author

@NiklasGustafsson one thing that has been bothering me about the previous PR and this one, with regard to toWillCopy() function.

While stepping through the to() call when writing this fix, I saw that most of the attributes are being moved twice. Once during the default to() function which copies all the parameters and buffers, and then again with the attributes. The reason is because after they move, they get a CUDA device index of 0 (the default index), whereas the default torch.CUDA has an index of -1.

The same thing would happen when calling regular module.cuda().cuda() if not for the memo field (_deviceType, _deviceIndex).

Generally, when deviceIndex = -1 it means the default device. Is there any way to confirm what the default device is, to avoid all these extra copies?

@NiklasGustafsson
Copy link
Contributor

Interesting. No, I haven't seen a way of enumerating the devices or get metadata on them. -1 is supposed to imply the "best" available device, which isn't necessarily 0, if I understand the logic correctly. On my workstation, I have a P400 and a 2080 SUPRA. -1 is supposed to pick the latter.

@shaltielshmid
Copy link
Contributor Author

shaltielshmid commented Dec 11, 2023

Ah, I see.
So I guess there isn't anything to do except have it copy everything.
It essentially renders the toWillCopy() function useless.

@NiklasGustafsson
Copy link
Contributor

Ah, I see. So I guess there isn't anything to do except have it copy everything. It essentially renders the toWillCopy() function useless.

I don't think that's true. There's CPU to consider, too, and for type conversion, if the source and target types are the same...

@shaltielshmid
Copy link
Contributor Author

Right.
I meant that when calling it with DeviceType.CUDA and Index = -1, then the function will always return true.

@shaltielshmid
Copy link
Contributor Author

I browsed through the libtorch code, and I believe that for CUDA, index -1 gets converted into a real index here.

Do we want to add handling so that when ".to()" is called with CUDA & index=-1, that we pull the index using that function, and use that as a base for checking if parameters need to be moved?

Alternatively since the parameters aren't actually being copied it's not a huge performance issue and we can just let it be.

@NiklasGustafsson
Copy link
Contributor

I browsed through the libtorch code, and I believe that for CUDA, index -1 gets converted into a real index here.

Do we want to add handling so that when ".to()" is called with CUDA & index=-1, that we pull the index using that function, and use that as a base for checking if parameters need to be moved?

Alternatively since the parameters aren't actually being copied it's not a huge performance issue and we can just let it be.

I'm all for following the PyTorch behavior as closely as possible, everywhere, with one exception: if a functionally equivalent alternative is higher-performance, then go with the higher performance.

@shaltielshmid
Copy link
Contributor Author

shaltielshmid commented Dec 12, 2023

The PyTorch behavior is to not allow a cuda device index of -1.
This is more of an enhancement.
PyTorch always calls the .to() function

@shaltielshmid
Copy link
Contributor Author

For now, the behavior is the same as that of PyTorch, so I think it's fine leaving it.
But it's worth pondering for the future if there's a way to know.
The performance gain isn't going to be significant since the tensors aren't copied just a new C# object is being created.

@NiklasGustafsson NiklasGustafsson merged commit 6d4d20e into dotnet:main Dec 12, 2023
@shaltielshmid shaltielshmid deleted the jit-attributes-to branch December 12, 2023 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants