Clarify regular expression timeout #7894

danmoseley · 2022-03-30T01:35:08Z

Clarify the effect of the timeout in one place (there's a huge number of places timeout is mentioned, but this seems like the key one). I'll also open a PR against the docs repo for the best practices as well.
Also fix some ".NET Framework" to be ".NET"

context

Various regular expression API's accept a timeout parameter. (It can also be defaulted through the AppDomain). This appears to set an upper bound on the execution time of the regular expression. However the purpose of the timeout feature is not to put a hard limit on the execution time of arbitrary patterns. It is specifically to help prevent denial of service attacks exploiting backtracking behavior, which might otherwise cause potentially quadratic or worse execution times. Depending on the pattern, it may not be feasible to determine whether an untrusted input will cause this behavior without actually running the matching, and that makes the timeout feature necessary.

In some cases the pattern may take arbitrarily longer time than the timeout specifies. One example is where the execution time is dominated by simply scanning the input for literal text (such as a newline character). This operation is essentially linear in the size of the input, with the constant varying depending on the pattern and input, and is entirely optimized for speed. It does not check the timeout. The mitigation for the execution time of this phase when provided with untrusted input is to simply limit the size of that input yourself before starting matching.

ghost · 2022-03-30T01:35:29Z

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Clarify the effect of the timeout in one place (there's a huge number of places timeout is mentioned, but this seems like the key one). I'll also open a PR against the docs repo for the best practices as well.
Also fix some ".NET Framework" to be ".NET"

context

Various regular expression API's accept a timeout parameter. (It can also be defaulted through the AppDomain). This appears to set an upper bound on the execution time of the regular expression. However the purpose of the timeout feature is not to put a hard limit on the execution time of arbitrary patterns. It is specifically to help prevent denial of service attacks exploiting backtracking behavior, which might otherwise cause potentially quadratic or worse execution times. Depending on the pattern, it may not be feasible to determine whether an untrusted input will cause this behavior without actually running the matching, and that makes the timeout feature necessary.

In some cases the pattern may take arbitrarily longer time than the timeout specifies. One example is where the execution time is dominated by simply scanning the input for literal text (such as a newline character). This operation is essentially linear in the size of the input, with the constant varying depending on the pattern and input, and is entirely optimized for speed. It does not check the timeout. The mitigation for the execution time of this phase when provided with untrusted input is to simply limit the size of that input yourself before starting matching.

Author:	danmoseley
Assignees:	-
Labels:	`area-System.Text.RegularExpressions`
Milestone:	-

opbld30 · 2022-03-30T02:02:31Z

Docs Build status updates of commit a0266d6:

✅ Validation status: passed

File	Status	Preview URL
xml/System.Text.RegularExpressions/CaptureCollection.xml	✅Succeeded	View
xml/System.Text.RegularExpressions/MatchCollection.xml	✅Succeeded	View
xml/System.Text.RegularExpressions/Regex.xml	✅Succeeded	View
xml/System.Text.RegularExpressions/RegexCompilationInfo.xml	✅Succeeded	View

For more details, please refer to the build report.

Note: Broken links written as relative paths are included in the above build report. For broken links written as absolute paths or external URLs, see the broken link report.

For any questions, please:

Try searching the docs.microsoft.com contributor guides
Post your question in the Docs support channel

xml/System.Text.RegularExpressions/Regex.xml

Co-authored-by: Genevieve Warren <[email protected]>

opbld31 · 2022-03-30T04:55:16Z

Docs Build status updates of commit 1445249:

✅ Validation status: passed

File	Status	Preview URL
xml/System.Text.RegularExpressions/CaptureCollection.xml	✅Succeeded	View
xml/System.Text.RegularExpressions/MatchCollection.xml	✅Succeeded	View
xml/System.Text.RegularExpressions/Regex.xml	✅Succeeded	View
xml/System.Text.RegularExpressions/RegexCompilationInfo.xml	✅Succeeded	View

For more details, please refer to the build report.

Note: Broken links written as relative paths are included in the above build report. For broken links written as absolute paths or external URLs, see the broken link report.

For any questions, please:

Try searching the docs.microsoft.com contributor guides
Post your question in the Docs support channel

joperezr · 2022-03-30T21:50:03Z

Thanks for fixing this @danmoseley. @gewarren @carlossanlop do these doc updates ever flow back to the repo's tripple slash comments? I suppose the answer is no, and if so I can do these updates on that side too.

danmoseley · 2022-03-30T22:15:31Z

My assumption was that we don't flow backwards, we just wait until the point where we want to make the code of this library the source of truth, at which point someone has to manually find the best combination of both sides and update the sources. If that's right, then there's not much value in updating the text in the code until then, as the content is so divergent at this point, it wouldn't merge properly anyway. (?) @carlossanlop ?

danmoseley added 2 commits March 29, 2022 19:14

typos

79699d7

note about timeout

a0266d6

danmoseley requested a review from a team as a code owner March 30, 2022 01:35

ghost added the area-System.Text.RegularExpressions label Mar 30, 2022

danmoseley requested review from gewarren and joperezr March 30, 2022 01:35

danmoseley mentioned this pull request Mar 30, 2022

Clarify regex timeout dotnet/docs#28854

Merged

gewarren approved these changes Mar 30, 2022

View reviewed changes

xml/System.Text.RegularExpressions/Regex.xml Outdated Show resolved Hide resolved

Update xml/System.Text.RegularExpressions/Regex.xml

1445249

Co-authored-by: Genevieve Warren <[email protected]>

danmoseley merged commit d58f72b into dotnet:main Mar 30, 2022

danmoseley deleted the regex.timeout branch March 30, 2022 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify regular expression timeout #7894

Clarify regular expression timeout #7894

Uh oh!

danmoseley commented Mar 30, 2022

Uh oh!

ghost commented Mar 30, 2022

context

Uh oh!

opbld30 commented Mar 30, 2022

Uh oh!

Uh oh!

opbld31 commented Mar 30, 2022

Uh oh!

joperezr commented Mar 30, 2022

Uh oh!

danmoseley commented Mar 30, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Clarify regular expression timeout #7894

Clarify regular expression timeout #7894

Uh oh!

Conversation

danmoseley commented Mar 30, 2022

context

Uh oh!

ghost commented Mar 30, 2022

context

Uh oh!

opbld30 commented Mar 30, 2022

✅ Validation status: passed

Uh oh!

Uh oh!

opbld31 commented Mar 30, 2022

✅ Validation status: passed

Uh oh!

joperezr commented Mar 30, 2022

Uh oh!

danmoseley commented Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

danmoseley commented Mar 30, 2022 •

edited

Loading