-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add descriptions to accelerator broadcast function/clean up all_gather #6044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The results of the last training/testing run will be cached within the training type plugin. | ||
| In distributed training, we make sure to transfer the results to the appropriate master process. | ||
| """ | ||
| # TODO: improve these docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved enough already to remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like rando todos, so let me remove it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my comment, but I don't remember what I wanted to improve 🤣
| def barrier(self, name: Optional[str] = None) -> None: | ||
| self.training_type_plugin.barrier(name=name) | ||
|
|
||
| def broadcast(self, obj: object, src: int = 0) -> object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we then rather call source_rank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its called source in the lower level pytorch call that is commonly used outside of TPU/some training type plugin:
https://pytorch.org/docs/stable/distributed.html#torch.distributed.broadcast
So I think it might be alright to keep it as is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for keep :)
Co-authored-by: Jirka Borovec <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #6044 +/- ##
======================================
Coverage 93% 93%
======================================
Files 160 160
Lines 11392 11397 +5
======================================
+ Hits 10578 10590 +12
+ Misses 814 807 -7 |
What does this PR do?
Address's @Borda comment of adding a better docstring to the broadcast function in #6039
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃