Skip to content

Conversation

@danmoseley
Copy link
Member

Clarify the effect of the timeout in one place (there's a huge number of places timeout is mentioned, but this seems like the key one). I'll also open a PR against the docs repo for the best practices as well.
Also fix some ".NET Framework" to be ".NET"

context

Various regular expression API's accept a timeout parameter. (It can also be defaulted through the AppDomain). This appears to set an upper bound on the execution time of the regular expression. However the purpose of the timeout feature is not to put a hard limit on the execution time of arbitrary patterns. It is specifically to help prevent denial of service attacks exploiting backtracking behavior, which might otherwise cause potentially quadratic or worse execution times. Depending on the pattern, it may not be feasible to determine whether an untrusted input will cause this behavior without actually running the matching, and that makes the timeout feature necessary.

In some cases the pattern may take arbitrarily longer time than the timeout specifies. One example is where the execution time is dominated by simply scanning the input for literal text (such as a newline character). This operation is essentially linear in the size of the input, with the constant varying depending on the pattern and input, and is entirely optimized for speed. It does not check the timeout. The mitigation for the execution time of this phase when provided with untrusted input is to simply limit the size of that input yourself before starting matching.

@danmoseley danmoseley requested a review from a team as a code owner March 30, 2022 01:35
@ghost
Copy link

ghost commented Mar 30, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Clarify the effect of the timeout in one place (there's a huge number of places timeout is mentioned, but this seems like the key one). I'll also open a PR against the docs repo for the best practices as well.
Also fix some ".NET Framework" to be ".NET"

context

Various regular expression API's accept a timeout parameter. (It can also be defaulted through the AppDomain). This appears to set an upper bound on the execution time of the regular expression. However the purpose of the timeout feature is not to put a hard limit on the execution time of arbitrary patterns. It is specifically to help prevent denial of service attacks exploiting backtracking behavior, which might otherwise cause potentially quadratic or worse execution times. Depending on the pattern, it may not be feasible to determine whether an untrusted input will cause this behavior without actually running the matching, and that makes the timeout feature necessary.

In some cases the pattern may take arbitrarily longer time than the timeout specifies. One example is where the execution time is dominated by simply scanning the input for literal text (such as a newline character). This operation is essentially linear in the size of the input, with the constant varying depending on the pattern and input, and is entirely optimized for speed. It does not check the timeout. The mitigation for the execution time of this phase when provided with untrusted input is to simply limit the size of that input yourself before starting matching.

Author: danmoseley
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -

@opbld30
Copy link

opbld30 commented Mar 30, 2022

Docs Build status updates of commit a0266d6:

✅ Validation status: passed

File Status Preview URL Details
xml/System.Text.RegularExpressions/CaptureCollection.xml ✅Succeeded View
xml/System.Text.RegularExpressions/MatchCollection.xml ✅Succeeded View
xml/System.Text.RegularExpressions/Regex.xml ✅Succeeded View
xml/System.Text.RegularExpressions/RegexCompilationInfo.xml ✅Succeeded View

For more details, please refer to the build report.

Note: Broken links written as relative paths are included in the above build report. For broken links written as absolute paths or external URLs, see the broken link report.

For any questions, please:

@opbld31
Copy link

opbld31 commented Mar 30, 2022

Docs Build status updates of commit 1445249:

✅ Validation status: passed

File Status Preview URL Details
xml/System.Text.RegularExpressions/CaptureCollection.xml ✅Succeeded View
xml/System.Text.RegularExpressions/MatchCollection.xml ✅Succeeded View
xml/System.Text.RegularExpressions/Regex.xml ✅Succeeded View
xml/System.Text.RegularExpressions/RegexCompilationInfo.xml ✅Succeeded View

For more details, please refer to the build report.

Note: Broken links written as relative paths are included in the above build report. For broken links written as absolute paths or external URLs, see the broken link report.

For any questions, please:

@danmoseley danmoseley merged commit d58f72b into dotnet:main Mar 30, 2022
@danmoseley danmoseley deleted the regex.timeout branch March 30, 2022 16:29
@joperezr
Copy link
Member

Thanks for fixing this @danmoseley. @gewarren @carlossanlop do these doc updates ever flow back to the repo's tripple slash comments? I suppose the answer is no, and if so I can do these updates on that side too.

@danmoseley
Copy link
Member Author

danmoseley commented Mar 30, 2022

My assumption was that we don't flow backwards, we just wait until the point where we want to make the code of this library the source of truth, at which point someone has to manually find the best combination of both sides and update the sources. If that's right, then there's not much value in updating the text in the code until then, as the content is so divergent at this point, it wouldn't merge properly anyway. (?) @carlossanlop ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants