-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add AsyncSource template for speculative execution #903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, code looks pretty clean and easy to understand :) A few minor API and test comments:
velox/common/base/AsyncSource.h
Outdated
| // will either wait for the make to finish or run the make on its | ||
| // own thread. | ||
| bool isPending() const { | ||
| return make_ && !item_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this test making_ instead of make_? For instance, this will return true if the object was just constructed but before prepare() was called, which sounds a bit counter-intuitive at a first look at the API. Is that the intended behavior?
| // Returns the item to the first caller and nullptr to subsequent callers. If | ||
| // the item is preparing on the executor, waits for the item and otherwise | ||
| // makes it on the caller thread. | ||
| std::unique_ptr<Item> move() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another way to implement this method (in a truly async way), is making it return a Future instead of blocking internally. The client of this API could then decide whether to block or propagate that future. Internally you would have to hold a SharedPromise instead of a Promise
| if (make) { | ||
| return make(); | ||
| } | ||
| auto& exec = folly::QueuedImmediateExecutor::instance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the part where ideally you would return the semi future so that the client could attach their preferred executor, instead of blocking the current thread.
| int32_t id; | ||
| }; | ||
|
|
||
| TEST(AsyncSourceTest, basic) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the "basic" test to test the basic API, like construct a single async source, check isPending() and hasValue(), then call prepare(), check the method return the expected results again, then move the results out, etc.
Then having the test below as a "multi-thread" test that checks for concurrent behavior.
|
The use case is to schedule a lot of computations with the expectation that most will be used but the order of use is not necessarily a queue order. If this were strictly a queue, this would be a future. So the pattern is that many threads throw things to be executed ahead of time, for example opening files, processing file metadata, pre-reading som densely referenced columns. Now these computations may advance at a very different rate and therefore strictly queueing these is not optimal. Suppose a big query fills the prefetch and a fast query with a very small read will be blocked behind the slow one.
The use case has the user of the data blocking on the data, so a use case like a future ready callback or chaining is not intended. If we have move() return a SemiFuture, this would be moving a scheduled item from one executor to another. Here we just move it to te sync user, we just differ in the queue order.
As n example, consider receiving splits. There's some 100 splits that will be consumed by 15 threads. Let's say that the next 30 splits are queud for file open and metadata. The worker threads then look at the split queue and try to get one that is ready. If there is nothing ready, they take the first not ready. This could be executing, but if not, this should not be queued behind some other query's split and instead it would be better to do the operation synchronously on the caller thread since it has nothing better to do There are cases where doing anything other than a sync wait is impractical, e.g. DwrfReader waiting for the next stripe. So we don't really have a good case for returning a SemiFuture.
+ bool isPending() const {
+ return make_ && !item_;
shouldn't this test making_ instead of make_? For instance, this will return true if the object was just constructed but before prepare() was called, which sounds a bit counter-intuitive at a first look at the API. Is that the intended behavior?
A: This is not needed. But The comment is correct.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for replying to the comments and for the tests. Looks good!
AsyncSource encapsulates a background computation. These can be scheduled on a background executor. The difference between this and a future is that when the user requires the result, AsyncSource will perform the async peration on the caller's thread and will turn the background operation into a no-op.
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
AsyncSource encapsulates a background computation. These can be
scheduled on a background executor. The difference between this and a
future is that when the user requires the result, AsyncSource will
perform the async peration on the caller's thread and will turn the
background operation into a no-op.