-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Clarify regular expression timeout #7894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsClarify the effect of the timeout in one place (there's a huge number of places timeout is mentioned, but this seems like the key one). I'll also open a PR against the docs repo for the best practices as well. contextVarious regular expression API's accept a timeout parameter. (It can also be defaulted through the AppDomain). This appears to set an upper bound on the execution time of the regular expression. However the purpose of the timeout feature is not to put a hard limit on the execution time of arbitrary patterns. It is specifically to help prevent denial of service attacks exploiting backtracking behavior, which might otherwise cause potentially quadratic or worse execution times. Depending on the pattern, it may not be feasible to determine whether an untrusted input will cause this behavior without actually running the matching, and that makes the timeout feature necessary. In some cases the pattern may take arbitrarily longer time than the timeout specifies. One example is where the execution time is dominated by simply scanning the input for literal text (such as a newline character). This operation is essentially linear in the size of the input, with the constant varying depending on the pattern and input, and is entirely optimized for speed. It does not check the timeout. The mitigation for the execution time of this phase when provided with untrusted input is to simply limit the size of that input yourself before starting matching.
|
|
Docs Build status updates of commit a0266d6: ✅ Validation status: passed
For more details, please refer to the build report. Note: Broken links written as relative paths are included in the above build report. For broken links written as absolute paths or external URLs, see the broken link report. For any questions, please:
|
Co-authored-by: Genevieve Warren <[email protected]>
|
Docs Build status updates of commit 1445249: ✅ Validation status: passed
For more details, please refer to the build report. Note: Broken links written as relative paths are included in the above build report. For broken links written as absolute paths or external URLs, see the broken link report. For any questions, please:
|
|
Thanks for fixing this @danmoseley. @gewarren @carlossanlop do these doc updates ever flow back to the repo's tripple slash comments? I suppose the answer is no, and if so I can do these updates on that side too. |
|
My assumption was that we don't flow backwards, we just wait until the point where we want to make the code of this library the source of truth, at which point someone has to manually find the best combination of both sides and update the sources. If that's right, then there's not much value in updating the text in the code until then, as the content is so divergent at this point, it wouldn't merge properly anyway. (?) @carlossanlop ? |
Clarify the effect of the timeout in one place (there's a huge number of places timeout is mentioned, but this seems like the key one). I'll also open a PR against the docs repo for the best practices as well.
Also fix some ".NET Framework" to be ".NET"
context
Various regular expression API's accept a timeout parameter. (It can also be defaulted through the AppDomain). This appears to set an upper bound on the execution time of the regular expression. However the purpose of the timeout feature is not to put a hard limit on the execution time of arbitrary patterns. It is specifically to help prevent denial of service attacks exploiting backtracking behavior, which might otherwise cause potentially quadratic or worse execution times. Depending on the pattern, it may not be feasible to determine whether an untrusted input will cause this behavior without actually running the matching, and that makes the timeout feature necessary.
In some cases the pattern may take arbitrarily longer time than the timeout specifies. One example is where the execution time is dominated by simply scanning the input for literal text (such as a newline character). This operation is essentially linear in the size of the input, with the constant varying depending on the pattern and input, and is entirely optimized for speed. It does not check the timeout. The mitigation for the execution time of this phase when provided with untrusted input is to simply limit the size of that input yourself before starting matching.