-
Notifications
You must be signed in to change notification settings - Fork 739
Background generator #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background generator #323
Conversation
|
Two suggestions:
|
torchaudio/datasets/utils.py
Outdated
| def run(self): | ||
| for item in self.generator: | ||
| self.queue.put(item) | ||
| self.queue.put(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An iterator might very well return None (such as indicating an empty datapoint). We should use a custom poison bill, something like a very specific class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a class _End to be the end of queue, and checking for this.
I've just renamed the class. What would wrapping the class in a function add? The notation is already that of a function: |
|
Mostly consistency. Classes are supposed to use CamelCase and factory functions aren't an uncommon pattern. It also allows us to write simple factories that initialize a complex object. There might be a few more advantages, we should think about it. |
We should respect the naming convention, so I'd be in favor of You have in mind a function like this? |
|
It could be |
Ok. Should we then also change the DiskCache?
Since an iterator may be sequential, the class simply detaches a thread to run separately. I'd put this out of the scope of this PR. To benefit from having many workers when the iterator is not sequential, we would then need to add something like concurrent futures in the thread. |
|
Ideally DiskCache is setup so that it can be combined with bg_iterator. Maybe this works: disk_cache writes the iterator values to disk as it reads it and bg_iterator allows that process to happen in the background. |
yup :) |
Done. |
|
Yeah I think that's a good idea, if you agree. Ok, so at this point this looks good, I'll accept it :) |
|
Relates to pytorch/pytorch#41292 |
We can prefetch the upcoming values of a generator. Since a generator may be strictly sequential, this computes new values in the background but not in parallel unlike here.
See source.