-
Notifications
You must be signed in to change notification settings - Fork 5.2k
mitigation for quic tests hangs #55985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/ncl Issue DetailsIt seems like the hang is when we run synchronous conformance tests. big thanks to @stephentoub who pointed to #53471 and suggested this mitigation. contributes to #55642
|
|
Can you fill me in on why this is an improvement? And can we add a comment with that info too? Is this what we should be doing in general when we have to do a sync wait on an async Task? There are other places we do this. |
|
I'm putting details on #55642 as follow up. The old code should work just fine but it does not. The improvement is that the tests no longer hang. |
As I told @wfurt, we shouldn't ship this way, but it's a fine stopgap temporarily to unblock tests if it helps. This was an experiment I suggested @wfurt try based on what I saw in the dump, which was that the thread pool had a few tasks in its global queue waiting to be run, very few threads in the system, nothing actively being executed, and yet those work items in the thread pool not being processed. I suspected it might have to do with @kouvel's recent changes to (ironically) more aggressively introduce threads to better handle sync-over-async, which hooks synchronously waiting on Task. So to see if that might be the cause, I suggested changing it to still block but do so via a WaitHandle rather than Task.Wait, and it apparently made the issue go away. |
|
Ok. @wfurt Can you add a comment in the code with a link to the original issue, something like I want to make sure we don't lose track of this. |
|
Should we close #55642? |
|
I would keep it open to 1) verify the stability and 2) have proper fix |
|
likely? and we should revert this change before we ship. |
|
I think this discussion should be move to the original issue then 😄 |
* Fix thread pool hang - In #53471 the thread count goal was moved out of `ThreadCounts`, it turns out that are a few subtle races that it was avoiding. There are other ways to fix it, but I've added the goal back into `ThreadCounts` for now. - Reverted PR #55985, which worked around the issue in the CI Fixes #55642 * Revert "mitigation for quic tests hangs (#55985)" This reverts commit 0a5e93b.
It seems like the hang is when we run synchronous conformance tests.
We receive notification, we reset the completion source but the WriteAsync task would never wake up and be finished.
big thanks to @stephentoub who pointed to #53471 and suggested this mitigation.
This is not final fix but I did few hundreds runs without problem when it would generally hang in less then 10 before on my dev box. This should at least improve PR experience and CI failure rate.
I will follow-up with details to help find better fix e.g. thread pool.
contributes to #55642