Minor updates and tweaks

IEvangelist · IEvangelist · commit 9d072ce91243 · 2022-10-03T23:20:29.000-05:00
diff --git a/docs/standard/base-types/regular-expressions-in-depth.md b/docs/standard/base-types/regular-expressions-in-depth.md
@@ -1076,14 +1076,14 @@ As noted earlier when talking about `IgnoreCase`, vectorization is the idea that
 
 One of the most important places for vectorization in a regex engine is when finding the next location a pattern could possibly match. For longer input text being searched, the time to find matches is frequently dominated by this aspect. As such, as of .NET 6, `Regex` had various tricks in place to get to those locations as quickly as possible:
 
-- **Anchors**. For patterns that began with an anchor, it could either avoid doing any searching if there was only one place the pattern could possibly begin (e.g. a "beginning" anchor, like `^` or `\A`), and it could skip past text it knew couldn't match (e.g. `IndexOf('\n')` for a "beginning-of-line" anchor if not currently at the beginning of a line).
-- **Boyer-Moore**. For patterns beginning with a sequence of at least two characters (case-sensitive or case-insensitive), it could use a [Boyer-Moore](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm) search to find the next occurrence of that sequence in the input text.
-- **IndexOf(char)**. For patterns beginning with a single case-sensitive character, it could use `IndexOf(char)` to find the next possible match location.
-- **IndexOfAny(char, char, ...)**. For patterns beginning with one of only a few case-sensitive characters, it could use `IndexOfAny(...)` with those characters to find the next possible match location.
+- **Anchors**: For patterns that began with an anchor, it could either avoid doing any searching if there was only one place the pattern could possibly begin (e.g. a "beginning" anchor, like `^` or `\A`), and it could skip past text it knew couldn't match (e.g. `IndexOf('\n')` for a "beginning-of-line" anchor if not currently at the beginning of a line).
+- **Boyer-Moore**: For patterns beginning with a sequence of at least two characters (case-sensitive or case-insensitive), it could use a [Boyer-Moore](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm) search to find the next occurrence of that sequence in the input text.
+- **IndexOf(char)**: For patterns beginning with a single case-sensitive character, it could use `IndexOf(char)` to find the next possible match location.
+- **IndexOfAny(char, char, ...)**: For patterns beginning with one of only a few case-sensitive characters, it could use `IndexOfAny(...)` with those characters to find the next possible match location.
 
 These optimizations are all really useful, but there are many additional possible solutions that .NET 7 now takes advantage of:
 
-- **Goodbye, Boyer-Moore**. `Regex` has used the Boyer-Moore algorithm since `Regex`'s earliest days; the `RegexCompiler` even emitted a customized implementation in order to maximize throughput. However, Boyer-Moore was created at a time when vector instruction sets weren't yet a reality. Most modern hardware can examine 8 or 16 16-bit `char`s in just a few instructions, whereas with Boyer-Moore, it's rare to be able to skip that many at a time (the most it can possibly skip at a time is the length of the substring for which it's searching). In the aforementioned corpus of ~19,000 regular expressions, ~50% of those expressions that begin with a case-sensitive prefix of at least two characters have a prefix less than or equal to four characters, and ~75% are less than or equal to eight characters. Moreover, the Boyer-Moore algorithm works by choosing a single character to examine in order to perform each jump, but a well-vectorized algorithm can simultaneously compare multiple characters, such as the first and last in the prefix (as described in [SIMD-friendly algorithms for substring searching](http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd)), enabling it to stay in the inner vectorized loop longer. In .NET 7, `IndexOf` performing an ordinal search for a string has been significantly improved with such tricks, and now in .NET 7, `Regex` uses `IndexOf` rather than Boyer-Moore, the implementation of which has been deleted (this was inspired by Rust's regex crate making a similar change [last year](https://github.com/rust-lang/regex/pull/767)). You can see the impact of this on a micro-benchmark like the following, which is finding every word in a document, creating a `Regex` for that word, and then using each `Regex` to find all occurrences of each word in the document (this would be an ideal use for the new `Count` method, but I'm not using it here as it doesn't exist in the previous releases being compared):
+- **Goodbye, Boyer-Moore**: `Regex` has used the Boyer-Moore algorithm since `Regex`'s earliest days; the `RegexCompiler` even emitted a customized implementation in order to maximize throughput. However, Boyer-Moore was created at a time when vector instruction sets weren't yet a reality. Most modern hardware can examine 8 or 16 16-bit `char`s in just a few instructions, whereas with Boyer-Moore, it's rare to be able to skip that many at a time (the most it can possibly skip at a time is the length of the substring for which it's searching). In the aforementioned corpus of ~19,000 regular expressions, ~50% of those expressions that begin with a case-sensitive prefix of at least two characters have a prefix less than or equal to four characters, and ~75% are less than or equal to eight characters. Moreover, the Boyer-Moore algorithm works by choosing a single character to examine in order to perform each jump, but a well-vectorized algorithm can simultaneously compare multiple characters, such as the first and last in the prefix (as described in [SIMD-friendly algorithms for substring searching](http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd)), enabling it to stay in the inner vectorized loop longer. In .NET 7, `IndexOf` performing an ordinal search for a string has been significantly improved with such tricks, and now in .NET 7, `Regex` uses `IndexOf` rather than Boyer-Moore, the implementation of which has been deleted (this was inspired by Rust's regex crate making a similar change [last year](https://github.com/rust-lang/regex/pull/767)). You can see the impact of this on a micro-benchmark like the following, which is finding every word in a document, creating a `Regex` for that word, and then using each `Regex` to find all occurrences of each word in the document (this would be an ideal use for the new `Count` method, but I'm not using it here as it doesn't exist in the previous releases being compared):
 
 ```csharp
 private string _text;