-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Make startup timeout recycle worker process #25321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is really concerning because you don't want a site stuck in an infinite restart loop. That would be especially bad in a cloud environment where you're paying for cpu time. How bad would it be to forcibly crash so that the rapid failure protection would kick in? |
I disagree with this being "really concerning" considering the tradeoff is you won't have a site in an unrecoverable state where there isn't a clear work around besides redeploying. It's a tradeoff we need to evaluate, and I currently think the pros outweigh the cons.
I'll get back to you on that. |
|
I pinged people on app services and IIS, waiting for responses now 😄 |
|
Chatted with @Tratcher, fine with change but let's add an opt-out flag to ANCM to add a work around if people hit issues. |
76ecacc to
bc058fd
Compare
|
@Tratcher updated. Let me know if you like the name of the config section. |
src/Servers/IIS/AspNetCoreModuleV2/CommonLib/ConfigurationSection.h
Outdated
Show resolved
Hide resolved
|
@Pilchie please merge when green 😄 |
|
Approved for RC2. |
Fixes #24485
Today, IIS will get into an unrecoverable state if the startup timeout limit is hit, until someone redeploys the site. This usually isn't a concern, however many people have multiple w3wp sites starting up at the same time, which occasionally causes w3wp process to timeout on the startup time limit, especially if there is a lot going on before startup.
This change makes it so instead of being in an unrecoverable state, ANCM will queue to recycle the worker process instead.
There are some drawbacks that should be noted with this change. By now restarting the process, more resources will be consumed by IIS. I think this drawback is fine for normal scenarios where the app just failed to start, restarting the app is preferable here as a restart most likely will fix the timeout hit.
Let's think of the worst case scenario. Let's say Program.Main has a Thread.Sleep, causing the process to always fail to start. If the process fails to start, we recycle the worker process. From what I can tell, the rapidFailureProtectionModule will not trigger as we are enqueuing for recycling the worker process rather than the worker process crashing (need to confirm this, but #25163 is blocking validation). So, if the startup time limit is set to 1 second, I'm concerned about IIS constantly needing to recycle the process. I'll chat with some IIS folk about that as well.
Other options besides this are to either increase the startup timeout, which requires a schema update (which are hard to deploy) and/or shim changes, or disabling the limit with in-process, which is concerning because people may want to use the limit itself.