Make startup timeout recycle worker process #25321

jkotalik · 2020-08-27T18:49:09Z

Today, IIS will get into an unrecoverable state if the startup timeout limit is hit, until someone redeploys the site. This usually isn't a concern, however many people have multiple w3wp sites starting up at the same time, which occasionally causes w3wp process to timeout on the startup time limit, especially if there is a lot going on before startup.

This change makes it so instead of being in an unrecoverable state, ANCM will queue to recycle the worker process instead.

There are some drawbacks that should be noted with this change. By now restarting the process, more resources will be consumed by IIS. I think this drawback is fine for normal scenarios where the app just failed to start, restarting the app is preferable here as a restart most likely will fix the timeout hit.

Let's think of the worst case scenario. Let's say Program.Main has a Thread.Sleep, causing the process to always fail to start. If the process fails to start, we recycle the worker process. From what I can tell, the rapidFailureProtectionModule will not trigger as we are enqueuing for recycling the worker process rather than the worker process crashing (need to confirm this, but #25163 is blocking validation). So, if the startup time limit is set to 1 second, I'm concerned about IIS constantly needing to recycle the process. I'll chat with some IIS folk about that as well.

Other options besides this are to either increase the startup timeout, which requires a schema update (which are hard to deploy) and/or shim changes, or disabling the limit with in-process, which is concerning because people may want to use the limit itself.

Tratcher · 2020-08-27T19:44:43Z

From what I can tell, the rapidFailureProtectionModule will not trigger as we are enqueuing for recycling the worker process rather than the worker process crashing (need to confirm this, but #25163 is blocking validation). So, if the startup time limit is set to 1 second, I'm concerned about IIS constantly needing to recycle the process.

This is really concerning because you don't want a site stuck in an infinite restart loop. That would be especially bad in a cloud environment where you're paying for cpu time.

How bad would it be to forcibly crash so that the rapid failure protection would kick in?

jkotalik · 2020-08-27T19:47:56Z

This is really concerning because you don't want a site stuck in an infinite restart loop. That would be especially bad in a cloud environment where you're paying for cpu time.

I disagree with this being "really concerning" considering the tradeoff is you won't have a site in an unrecoverable state where there isn't a clear work around besides redeploying. It's a tradeoff we need to evaluate, and I currently think the pros outweigh the cons.

How bad would it be to forcibly crash so that the rapid failure protection would kick in?

I'll get back to you on that.

jkotalik · 2020-08-27T21:04:47Z

I pinged people on app services and IIS, waiting for responses now 😄

jkotalik · 2020-08-31T21:56:36Z

Chatted with @Tratcher, fine with change but let's add an opt-out flag to ANCM to add a work around if people hit issues.

jkotalik · 2020-09-01T16:53:05Z

@Tratcher updated. Let me know if you like the name of the config section.

src/Servers/IIS/AspNetCoreModuleV2/CommonLib/ConfigurationSection.h

jkotalik · 2020-09-01T20:54:50Z

@Pilchie please merge when green 😄

Pilchie · 2020-09-01T22:30:18Z

Approved for RC2.

jkotalik added the area-servers label Aug 27, 2020

jkotalik added this to the 5.0.0-rc2 milestone Aug 27, 2020

jkotalik requested review from BrennanConroy, Tratcher and davidfowl August 27, 2020 18:49

jkotalik requested a review from halter73 as a code owner August 27, 2020 18:49

jkotalik added 3 commits September 1, 2020 09:36

Make startup time recycle worker process

bd18884

oops

329403c

Add opt out switch

bc058fd

jkotalik force-pushed the jkotalik/startupTimeLimit branch from 76ecacc to bc058fd Compare September 1, 2020 16:52

jkotalik changed the base branch from release/5.0 to release/5.0-rc2 September 1, 2020 16:52

Tratcher approved these changes Sep 1, 2020

View reviewed changes

src/Servers/IIS/AspNetCoreModuleV2/CommonLib/ConfigurationSection.h Outdated Show resolved Hide resolved

jkotalik added 2 commits September 1, 2020 10:19

rename

939bf11

Update StartupTests.cs

b4297c5

jkotalik added the ask-mode This issue / PR is a patch candidate which we will bar-check internally before patching it. label Sep 1, 2020

Pilchie added the Servicing-approved Shiproom has approved the issue label Sep 1, 2020

Pilchie merged commit ee71226 into release/5.0-rc2 Sep 1, 2020

Pilchie deleted the jkotalik/startupTimeLimit branch September 1, 2020 22:30

BrennanConroy mentioned this pull request Sep 11, 2020

aspnet core 3.1 in-process hosting hitting 500.37 ANCM startup timeout. #25759

Closed

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make startup timeout recycle worker process #25321

Make startup timeout recycle worker process #25321

Uh oh!

jkotalik commented Aug 27, 2020

Uh oh!

Tratcher commented Aug 27, 2020

Uh oh!

jkotalik commented Aug 27, 2020 •

edited

Loading

Uh oh!

jkotalik commented Aug 27, 2020

Uh oh!

jkotalik commented Aug 31, 2020

Uh oh!

jkotalik commented Sep 1, 2020

Uh oh!

Uh oh!

jkotalik commented Sep 1, 2020

Uh oh!

Pilchie commented Sep 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Make startup timeout recycle worker process #25321

Make startup timeout recycle worker process #25321

Uh oh!

Conversation

jkotalik commented Aug 27, 2020

Uh oh!

Tratcher commented Aug 27, 2020

Uh oh!

jkotalik commented Aug 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotalik commented Aug 27, 2020

Uh oh!

jkotalik commented Aug 31, 2020

Uh oh!

jkotalik commented Sep 1, 2020

Uh oh!

Uh oh!

jkotalik commented Sep 1, 2020

Uh oh!

Pilchie commented Sep 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jkotalik commented Aug 27, 2020 •

edited

Loading