[SPARK-18223] [CORE] Optimise PartitionedAppendOnlyMap implementation #15735

a-roberts · 2016-11-02T11:31:34Z

What changes were proposed in this pull request?

This class, like the PartitionedPairBuffer class, are both core Spark data structures that allow us to spill data to disk.

From the comment in ExternalSorter before instantiating said data structures:
// Data structures to store in-memory objects before we spill. Depending on whether we have an
// Aggregator set, we either put objects into an AppendOnlyMap where we combine them, or we
// store them in an array buffer.

All of our data within RDDs has a partition ID and the ordering operations will order by a partition before any other criteria. Such data structures share a partitionKeyComparator from WriteablePartitionedPairCollection.

While this change adds more code, it is the bad iterator wrapping we remove that has a negative performance impact. In this case we avoid said wrapping to help the inliner. When avoided we've observed a 3% PageRank performance increase on HiBench large for both IBM's SDK for Java and OpenJDK 8 as a result of the inliner being better able to figure out what's going on. This observation is seen when combined with an optimisation PartitionedPairBuffer implementation I'll also contribute.

How was this patch tested?

Existing unit tests and HiBench large, PageRank benchmark specifically.

This class, like the PartitionedPairBuffer class, are both core Spark data structures that allow us to spill data to disk. From the comment in ExternalSorter before instantiating said data structures: // Data structures to store in-memory objects before we spill. Depending on whether we have an // Aggregator set, we either put objects into an AppendOnlyMap where we combine them, or we // store them in an array buffer. All of our data within RDDs has a partition ID and the ordering operations will order by a partition before any other criteria. Such data structures share a partitionKeyComparator from WriteablePartitionedPairCollection. While this change adds more code, it is the bad iterator wrapping we remove that has a negative performance impact. In this case we avoid said wrapping to help the inliner. When avoided we've observed a 3% PageRank performance increase on HiBench large for both IBM's SDK for Java and OpenJDK 8 as a result of the inliner being better able to figure out what's going on. This observation is seen when combined with an optimisation PartitionedPairBuffer implementation I'll also contribute.

SparkQA · 2016-11-02T11:34:43Z

Test build #67983 has finished for PR 15735 at commit 7d2e53a.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-11-02T13:00:37Z

Oh, yeah this is virtually identical to #15736 where I've already commented. I don't see a value in handing this separately?

a-roberts · 2016-11-02T13:13:25Z

Good point, I'll contribute the changes here with #15736 and address your comments there for both changes

a-roberts mentioned this pull request Nov 2, 2016

[SPARK-18224] [CORE] Optimise PartitionedPairBuffer implementation #15736

Closed

a-roberts closed this Nov 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-18223] [CORE] Optimise PartitionedAppendOnlyMap implementation #15735

[SPARK-18223] [CORE] Optimise PartitionedAppendOnlyMap implementation #15735

Uh oh!

a-roberts commented Nov 2, 2016

Uh oh!

SparkQA commented Nov 2, 2016

Uh oh!

srowen commented Nov 2, 2016

Uh oh!

a-roberts commented Nov 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-18223] [CORE] Optimise PartitionedAppendOnlyMap implementation #15735

[SPARK-18223] [CORE] Optimise PartitionedAppendOnlyMap implementation #15735

Uh oh!

Conversation

a-roberts commented Nov 2, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Nov 2, 2016

Uh oh!

srowen commented Nov 2, 2016

Uh oh!

a-roberts commented Nov 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants