-
Notifications
You must be signed in to change notification settings - Fork 3
Add timeout functionality to IntersectClient. #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: candidate-0.9.0
Are you sure you want to change the base?
Conversation
|
@andrewfayres could you make the target branch |
Signed-off-by: Lance-Drane <[email protected]>
Signed-off-by: Lance-Drane <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a new commit to reflect the changes of the "UserspaceMessage" format, which fixed two of the unit tests for me. However, one of the new unit tests is still failing for me, and the E2E tests are also failing.
Additionally, I thought we also wanted the Service to potentially "fail fast" in the event that the message is being processed after a specific point in time? This work is just to stop the Service from having to process a long-term message, and I think its most important usage is for immediately acknowledging and discarding the message without going through processing.
Otherwise, I think the unit tests themselves do a good job for capturing requirements, and this should at least cover the Client side well, so once all the tests pass I'll go ahead and merge this in.
| # Wait to make sure timeout doesn't fire | ||
| time.sleep(0.7) | ||
|
|
||
| assert len(timeout_called) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this block of code, I'm currently getting len(timeout_called) == 1, so the function is being called anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being caused by the line: here. Basically, because the test sets the flag _terminate_after_initial_messages, we return early and never really process the message. Looking at the code, there's a few other scenarios that could cause this. When you're back from vacation I think we should have a quick discussion about what we want the behavior to be in these cases. Currently, I lean towards putting the logic to remove the calls from the timeout check at the top before all of the fast exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I lean towards putting the logic to remove the calls from the timeout check at the top before all of the fast exits.
This is probably fine. I consider the fast exit usecase itself as somewhat niche (and one which we wouldn't attempt to replicate on the general-purpose orchestrator, which is what most people will end up using), and also one which shouldn't be setting timeouts on any of its own calls, so doing the timeout logic before the fast exit logic shouldn't cause any problems. Realistically I think the tests should probably not even use fast exit and should probably just use simple callbacks, the callbacks themselves could probably just assert that the message is or is not an error.
| # If not in pending requests, it already timed out, so ignore this response | ||
| if headers.operation_id in self._pending_requests: | ||
| del self._pending_requests[headers.operation_id] | ||
| else: | ||
| logger.debug( | ||
| f'Received response for operation {headers.operation_id} that already timed out, ignoring' | ||
| ) | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The E2E tests are failing and I believe it's partially related to this block of code, because the existing E2E tests do not explicitly specify a timeout. self._pending_requests is not necessarily the source of truth for requests which haven't timed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you're right. In addition to it being an issue for when a timeout wasn't set, it was an issue for when a client gets restarted. I've changed the logic to account for this. Will push everything after we discuss the above.
| if params.timeout is not None and params.on_timeout is not None: | ||
| self._pending_requests[params.operation].append( | ||
| { | ||
| 'timeout': time.time() + params.timeout, | ||
| 'on_timeout': params.on_timeout, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another part of the reason I think the E2E tests are failing. If the user doesn't specify a timeout on a pending request, then by default the request should be handled.
I also don't think that params.on_timeout should explicitly be checked for validity purposes, it should just be an optional function called in the specific usecase a user's function times out. A function should be allowed to timeout even if this callback isn't specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Fixed and will push once we have a chance to discuss the first issue.
Adding timeouts to the clients. We can expand this idea for the service to service side as we want but I wanted to go ahead and get this part moving.