-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18224] [CORE] Optimise PartitionedPairBuffer implementation #15736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change is very similar to my pull request or improving PartitionedPairAppendOnlyMap: #15735 Summarising (more detail above), we avoid the slow iterator wrapping in favour of helping the inliner. We observed that this, when combined with the above change, leads to a 3% performance increase on the HiBench large PageRank benchmark with both IBM's SDK for Java and with OpenJDK 8
|
Test build #67984 has finished for PR 15736 at commit
|
| } else | ||
| new Comparator[(Int, K)] { | ||
| override def compare(a: (Int, K), b: (Int, K)): Int = { | ||
| val partitionDiff = a._1 - b._1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some indentation problems here and the else clause is missing a brace. I think you can omit the type of comparator; no space before the colon in any event.
This subtraction can overflow in theory and give the wrong answer, but the existing code does it, so, pass on that.
While optimizing, do you want to call keyComparator.get outside the class definition?
There's a similar construct in PartitionedAppendOnlyMap that should be changed too. Can this be refactored maybe?
Can the method partitionKeyComparator go away? I think the whole WritablePartitionedPairCollection object goes away after this if you care to 'inline' it too in the one refactored instance.
|
Will be adding the commit from #15735 here upon addressing the feedback |
Inline benefit with this approach as we avoid the bad iterator wrapping
|
Addressed the scalastyle comments and added the PartitionedAppendOnlyMap change here as per the above suggestions, will look at the review comments next Two unrelated asides
|
|
Test build #67986 has finished for PR 15736 at commit
|
| partitionDiff | ||
| } else { | ||
| keyComparator.get.compare(a._2, b._2) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can dereference the option to avoid get in inner loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think Github collapsed it but there's this and other suggestions at ... #15736 (review)
|
Test build #67987 has finished for PR 15736 at commit
|
|
To recap some of my feedback here, I think this will be a fine change but it can be refactored further. I think we can refactor this logic that appears twice in one place, perhaps in Since the two methods there are only used by these call sites that are changing, they can be 'inlined' into the one common implementation, which might open up more optimization. It'd be nice to fix the subtraction issue while we're here unless someone is convinced the difference can never overflow.
|
|
@a-roberts this does look like a worthy change, what do you think of the further simplifications here? |
|
Sean, they are great suggestions, thanks -- I'll find the time (like for the other outstanding pull requests) to get your feedback integrated, tested and profiled, currently caught up in packaging our own Apache Spark releases for both 1.6.3 and 2.0.2. I also have a JIRA to create proposing regular performance runs using the latest Spark snapshot builds to track regressions (I have this all set up with scripts already) |
|
I know you're busy but this does look like a good change to finish off. That it's a win is self-evident, just a question of how much, and benchmarks you have already show it is an improvement. I can take it on (credit remains with you) or will just wait if you're getting back to it. |
|
I'm resuming the work for all of these related PRs again this week after the London Spark meetup on Wednesday, if you are keen to take it on I'm more than happy to help out and will share some information here that yourself and others should find useful. Useful tools Benchmarks |
|
Back to working on the performance related JIRAs now, so based on the above helpful comments here's what I'll do Remove the .get.compare from the loop as suggested above - we'll do a .get upfront to get our comparator to use, eliminating the .get later Move the duplicated code into the WritablePartitionedPairCollection object so the two methods optimised here will call the above new method (let's say it's called getComparator) before returning accordingly (both methods are the same apart from the final few lines). PartitionedAppendOnlyMap returns and PartitionedPairBuffer returns: I'll then build/test/profile this again |
| } | ||
| } | ||
|
|
||
| /* Takes an optional parameter (keyComparator), use if provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadoc/scaladoc starts with /** and usually you leave that alone on one line and start documentation on the next.
| * and returns a comparator for the partitions | ||
| */ | ||
| def getComparator[K](keyComparator: Option[Comparator[K]]) : Comparator[(Int, K)] = { | ||
| val comparator : Comparator[(Int, K)] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparator is now entirely redundant. The whole body is just the if statement
| : Iterator[((Int, K), V)] = { | ||
| val comparator = keyComparator.map(partitionKeyComparator).getOrElse(partitionComparator) | ||
| new Sorter(new KVArraySortDataFormat[(Int, K), AnyRef]).sort(data, 0, curSize, comparator) | ||
| new Sorter(new KVArraySortDataFormat[(Int, K), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think breaking the line here is odd. If necesasry, pull out the result of getComparator to a statement above to shorten this line, like it was before.
| } else { | ||
| new Comparator[(Int, K)] { | ||
| // We know we have a non-empty comparator here | ||
| val ourKeyComp = keyComparator.get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be outside the body of the anonymous class. You don't need a reference to the Option here even in the anonymous class.
| /* Takes an optional parameter (keyComparator), use if provided | ||
| * and returns a comparator for the partitions | ||
| */ | ||
| def getComparator[K](keyComparator: Option[Comparator[K]]) : Comparator[(Int, K)] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: no space before colon
| */ | ||
| def getComparator[K](keyComparator: Option[Comparator[K]]) : Comparator[(Int, K)] = { | ||
| val comparator : Comparator[(Int, K)] = | ||
| if (keyComparator.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isDefined is probably a tiny bit more conventional (and then flip the logic here of course)
| comparator | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete partitionKeyComparator below now, right?
| def getComparator[K](keyComparator: Option[Comparator[K]]) : Comparator[(Int, K)] = { | ||
| val comparator : Comparator[(Int, K)] = | ||
| if (keyComparator.isEmpty) { | ||
| partitionComparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be inlined now
| // We know we have a non-empty comparator here | ||
| val ourKeyComp = keyComparator.get | ||
| override def compare(a: (Int, K), b: (Int, K)): Int = { | ||
| val partitionDiff = a._1 - b._1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not thrilled about the subtraction here but maybe leave it for now
|
Test build #69164 has finished for PR 15736 at commit
|
|
Test build #69165 has finished for PR 15736 at commit
|
|
Test build #69172 has finished for PR 15736 at commit
|
|
I've conducted a lot of performance tests and gathered .hcd files so I can investigate this next week, but it looks like either the first commit is the best for performance or my current configuration with this benchmark results in us being unable to infer if our changes really make a difference. Sharing some raw data, the format is as follows. Benchmark name, date, time, data size in bytes (the same each run), the elapsed time and the throughput (bytes per second). With the above suggestions for Partitioned*Buffer Vanilla, no changes at all Original commit In Healthcenter I do see that these methods are still great candidates for optimisation as they are all very commonly used. Open to more suggestions, I have exclusive access to lots of hardware, can easily churn out more custom builds and have lots of profiling software we can use. I'll be committing code for the SizeEstimator soon as that's a good candidate for optimisation here as well. |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't see why this would be slower than the original version. It should be nearly identical anyway or better, as it further inlines a few things. It could be some weird interactions with the JIT and benchmark or whatever, or maybe some difference in how it was tested.
Try one more round of changes here and benchmark again. In any event it would be worthwhile just for the code streamlining.
| keyComparator.compare(a._2, b._2) | ||
| def getComparator[K](keyComparator: Option[Comparator[K]]): Comparator[(Int, K)] = { | ||
| if (!keyComparator.isDefined) return partitionComparator | ||
| else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style is off here -- you need braces in both clauses, return is redundant, and there's no point in inverting the condition as opposed to just flipping the clauses.
| } else { | ||
| keyComparator.compare(a._2, b._2) | ||
| def getComparator[K](keyComparator: Option[Comparator[K]]): Comparator[(Int, K)] = { | ||
| if (!keyComparator.isDefined) return partitionComparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline and remove partitionComparator, as I think it's not used
| // We know we have a non-empty comparator here | ||
| override def compare(a: (Int, K), b: (Int, K)): Int = { | ||
| val partitionDiff = a._1 - b._1 | ||
| if (partitionDiff != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably very very slightly better to say
if (a._1 == b._1) {
theKeyComp.compare(a._2, b._2)
} else {
a._1 - b._1
}
|
Before progressing I've discussed what I'm seeing with our JIT compiler team, with the refactoring to reduce code duplication, the following occurs which solves some of the mystery -- although it's bad news as, like you, I wanted to remove the duplicate method. Summarising:
|
|
If that's true then again doesn't my suggestion to inline |
|
@srowen how about this for profiling? |
|
Looks right except you just want to write |
|
Good point, done, I can get profiling the below code then? Builds fine and no scalastyle problems |
|
(Particularly) as the number of partitions increase, "if (a._1 != b._1)" might be better for bpt reasons. |
|
I see, so doing the != comparison first which is likely to be true more of the time so we're not consistently failing this check then entering the else Again that builds fine |
|
Hm why does the order matter - maybe helps branch prediction? I doubt we even know how the bytecode orders this let alone how it is JITted and whether it will gather branching info on this one branch. Either way. I usually prefer == for code clarity all else equal. No need to benchmark both just pick one. |
|
Passed on your question to our JIT developers
Numbers for us Refactored further as above initial commit The first commit performs better on average, I'd like to next add the improved compare code as above and "push this down" into the subclasses to see how this performs |
|
I'd certainly be curious to see a benchmark of the 'final' version with inlined comparator. I would honestly be surprised if that's not fastest of all. |
|
New data for us, inlined comparator scores here (code provided below to check I've not profiled something useless!): Remember our "vanilla" average time is 47.752s and our first commit averaged 47.229s (so not much of a difference really). I think we're splitting hairs and I've got another PR I am seeing good results on that I plan to focus on instead: the SizeEstimator. This is what I've benchmarked, PartitionedAppendOnlyMap first, so let me know if there any further suggestions, otherwise I propose leaving this one for later as actually against the Spark master codebase I'm not noticing anything exciting. In PartitionedPairBuffer WritablePartitionedPairCollection remains unchanged. |
|
It does seem like nice cleanup in any event. I am not sure why the first commit was faster as this seems like a 'superset' of optimization. We can't use that one in any event. If you want to update the PR with what you posted above, I think it'd be OK to commit just for the code simplification. |
|
@a-roberts let's either finish the thought and merge this as mostly a code cleanup and maybe marginal win, or just close it. |
|
Ping @a-roberts to resolve this |
|
I'm going to manually close this |
Closes apache#15736 Closes apache#16309 Closes apache#16485 Closes apache#16502 Closes apache#16196 Closes apache#16498 Closes apache#12380 Closes apache#16764
What changes were proposed in this pull request?
This change is very similar to my pull request for improving PartitionedPairAppendOnlyMap: #15735
Summarising (more detail above), we avoid the slow iterator wrapping in favour of helping the inliner. We observed that this, when combined with the above change, leads to a 3% performance increase on the HiBench large PageRank benchmark with both IBM's SDK for Java and with OpenJDK 8
How was this patch tested?
Existing unit tests and HiBench large profile with both IBM's SDK for Java and OpenJDK 8, the PageRank benchmark specifically