-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Only scan the diff once when processing carriage returns #1270
Only scan the diff once when processing carriage returns #1270
Conversation
I did consider scanning through it one char at a time, but took a punt that since Would you be interested in seeing them profiled in BenchmarkDotNet? |
|
I've just done a couple of ad-hoc benchmarks. One for this implementation: static void Benchmark_ReadLineAndCountCarriageReturns_100000()
{
// use current file as test data
var file = new System.Diagnostics.StackFrame(true).GetFileName();
var text = File.ReadAllText(file);
for (int count = 0; count < 100000; count++)
{
var lineReader = new DiffUtilities.LineReader(text);
DiffUtilities.LineReader.LineInformation line;
while ((line = lineReader.ReadLine()).Line != null)
{
int crs = line.CarriageReturns;
}
}
}A similar one for the original implementation: static void Benchmark_ReadLineAndCountCarriageReturns_100000()
{
var file = new System.Diagnostics.StackFrame(true).GetFileName();
var text = File.ReadAllText(file);
for (int count = 0; count < 100000; count++)
{
var lineReader = new DiffUtilities.LineReader(text);
string line;
while ((line = lineReader.ReadLine()) != null)
{
DiffUtilities.CountCarriageReturns(line);
}
}
}The first one completes in ~26.91 seconds, the second one in ~6.27 seconds. I think |
jcansdale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some very basic benchmarking. Could you take a look and maybe feed it some more representative data.
I guess we could use this sample data for a large PR:
https://patch-diff.githubusercontent.com/raw/github/VisualStudio/pull/1004.diff
| if (index != -1) | ||
| var carriageReturns = 0; | ||
| StringBuilder sb = new StringBuilder(); | ||
| for (; index < length; index++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could maybe adapt this implementation to use, text.IndexOfAny(new[] {'\r', '\n'}, index)? With new[] {'\r', '\n'}, stored in a static. 😉
|
Here are some alternative implementations we tried: Merged #1268 |
The previous solution still scans the string 3 times, one for each
IndexOfandSubstringcall. Given that we have to go through the string at least once to find the boundaries, we should grab all the information we need along the way immediately. Also given that there is a customLineReaderclass, we can take advantage of that and return all the data we need in one go.Also add null checks for invalid data being passed into the constructor. The current caller of
LineReaderprobably never calls it with null, but sinceLineReaderis a public class, other code in the future might decide to call it (or it could be expanded), and we should make sure the assumptions we make are documented.