-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
str.IndexOf(stringToFind) returns the index of the start of the match but there is no way to tell the length of the match. The match length can be different from the search input (stringToFind.Length) in a culture specific mode such as CurrentCulture.
For example:
Console.WriteLine("Straße".IndexOf("Straße", StringComparison.CurrentCulture)); //0
Console.WriteLine("Straße".IndexOf("Strasse", StringComparison.CurrentCulture)); //0
Console.WriteLine("Strasse".IndexOf("Straße", StringComparison.CurrentCulture)); //0
Console.WriteLine("Strasse".IndexOf("Strasse", StringComparison.CurrentCulture)); //0
Console.WriteLine("Straße".Length); //6
Console.WriteLine("Straße".Normalize(NormalizationForm.FormC).Length); //6
Console.WriteLine("Straße".Normalize(NormalizationForm.FormD).Length); //6
Console.WriteLine("Strasse".Length); //7Straße and Strasse are equivalent according to German string processing rules. The example code shows that the two forms match each other. This means that a longer string can be matched in a short string and the other way around.
The Framework should expose the length of the match. For example, in "Straße".IndexOf("Strasse", StringComparison.CurrentCulture) it should return 6 although stringToFind.Length == 7. In "Strasse".IndexOf("Straße", StringComparison.CurrentCulture) it should return 7 although stringToFind.Length == 6.
The motivating use case is HTML word highlighting. If I want to highlight all occurrences of Straße with <b></b> then I need to determine the length of the match.
Doing this currently requires Windows specific PInvoke or a search loop repeatedly calling IndexOf: https://stackoverflow.com/questions/20480016/length-of-substring-matched-by-culture-sensitive-string-indexof-method
I also found that Regex can accurately report the match length. But Regex is slow and awkward to use for non-constant search strings. I'm also not sure if Regex supports the exact same string processing semantics that the normal string functions have.
API proposal:
class String {
public int IndexOf(string value, int startIndex, int count, StringComparison comparisonType, out int matchLength);
public int LastIndexOf(string value, int startIndex, int count, StringComparison comparisonType, out int matchLength);
public bool StartsWith(string value, StringComparison comparisonType, out int matchLength);
public bool EndsWith(string value, StringComparison comparisonType, out int matchLength);
}
static class MemoryExtensions {
public static int IndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType, out int matchLength);
public static int LastIndexOf(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType, out int matchLength);
public static bool StartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType, out int matchLength);
public static bool EndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType, out int matchLength);
}We could also add this to StartsWith and EndsWith. They are currently implemented in terms of IndexOf.