[SPARK-12688][SQL] Fix spill size metric in unsafe external sorter #10634

carsonwang · 2016-01-07T07:16:21Z

When doing a sql aggregation in tungsten mode, the spill size metric may not update though there are data spilled. Because the data are stored in the BytesToBytesMap instead of UnsafeExternalSorter, the memory are freed when the map resets instead of when the sorter spills.

SparkQA · 2016-01-07T07:38:22Z

Test build #48913 has finished for PR 10634 at commit 2fd3ab6.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-07T07:52:36Z

Test build #48915 has finished for PR 10634 at commit 81e3227.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-07T10:53:05Z

Test build #48923 has finished for PR 10634 at commit be67488.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-08T03:14:55Z

Test build #48992 has finished for PR 10634 at commit 416d73d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class JavaSampleActorReceiver<T> extends JavaActorReceiver
- public class JavaActorWordCount
- abstract class ActorReceiver extends Actor
- abstract class JavaActorReceiver extends UntypedActor

JoshRosen · 2016-01-13T21:54:32Z

sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java

According to my understanding of spill metrics, a spill needs to update both memoryBytesSpilled and diskBytesSpilled. The memory spill is the in-memory size of the data being spilled, while the disk spill records the size of that data after it has been serialized and written to disk. As a result, I think that there must be a corresponding incDiskBytesSpilled call somewhere. I'm thinking that this memory spill metric should be updated closer to the site of where we increment the disk bytes spilled rather than here, since I think doing it that way would make it easier to reason about whether we're double-counting.

If this does turn out to be the right place for this spill, it would be great to add a code comment explaining the rationale for why this call must be here.

Ping @carsonwang, do you plan to update this PR to address my comment?

Sorry for the delay, @JoshRosen . I will update this soon.

carsonwang · 2016-01-25T04:47:12Z

retest this please

SparkQA · 2016-01-25T06:45:43Z

Test build #49974 has finished for PR 10634 at commit 4486071.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

carsonwang · 2016-01-25T07:04:58Z

@JoshRosen , I now also update diskBytesSpilled. Previously it is not updated for aggregation. Please help review this.

carsonwang · 2016-01-25T07:09:33Z

core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java

For Sql aggregation, the spillSize here is 0 because the data are stored in a map instead of this sorter. So incMemoryBytesSpilled(spillSize) actually increase 0. We need update the MemoryBytesSpilled after freeing the memory in the map.

carsonwang · 2016-01-25T07:10:38Z

retest this please

SparkQA · 2016-01-25T07:18:13Z

Test build #49976 has finished for PR 10634 at commit 4486071.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-25T09:43:44Z

Test build #49984 has finished for PR 10634 at commit d689873.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-25T10:53:47Z

Test build #49983 has finished for PR 10634 at commit 4486071.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

carsonwang · 2016-02-06T02:16:20Z

@JoshRosen , do you have any further comments?

carsonwang · 2016-02-06T02:18:00Z

/cc @cloud-fan @andrewor14 , did you guys see spill size > 0 when the UI was introduced? Can you take a look at this fix?

cloud-fan · 2016-02-06T02:35:42Z

is it possible to write a test for this bug?

SparkQA · 2016-02-06T04:18:56Z

Test build #50857 has finished for PR 10634 at commit 4cc0862.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-06-15T22:32:47Z

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. We can also continue the discussion on the JIRA ticket.

carsonwang added 2 commits January 7, 2016 14:57

Fix spill size metric in unsafe external sorter

2fd3ab6

Update the comment

81e3227

Fix mima error

be67488

Merge branch 'master' into FixSpillSiz

416d73d

JoshRosen reviewed Jan 13, 2016
View reviewed changes

carsonwang added 2 commits January 25, 2016 11:20

Update the diskBytesSpilled metric

d1e9f7d

Merge branch 'master' into FixSpillSiz

4486071

carsonwang reviewed Jan 25, 2016
View reviewed changes

Fix spillsize value

d689873

Merge branch 'master' into FixSpillSiz

4cc0862

asfgit closed this in 1a33f2e Jun 15, 2016

[SPARK-12688][SQL] Fix spill size metric in unsafe external sorter #10634

[SPARK-12688][SQL] Fix spill size metric in unsafe external sorter #10634

Uh oh!

Conversation

carsonwang commented Jan 7, 2016

Uh oh!

SparkQA commented Jan 7, 2016

Uh oh!

SparkQA commented Jan 7, 2016

Uh oh!

SparkQA commented Jan 7, 2016

Uh oh!

SparkQA commented Jan 8, 2016

Uh oh!

JoshRosen Jan 13, 2016

Choose a reason for hiding this comment

Uh oh!

JoshRosen Jan 24, 2016

Choose a reason for hiding this comment

Uh oh!

carsonwang Jan 25, 2016

Choose a reason for hiding this comment

Uh oh!

carsonwang commented Jan 25, 2016

Uh oh!

SparkQA commented Jan 25, 2016

Uh oh!

carsonwang commented Jan 25, 2016

Uh oh!

carsonwang Jan 25, 2016

Choose a reason for hiding this comment

Uh oh!

carsonwang commented Jan 25, 2016

Uh oh!

SparkQA commented Jan 25, 2016

Uh oh!

SparkQA commented Jan 25, 2016

Uh oh!

SparkQA commented Jan 25, 2016

Uh oh!

carsonwang commented Feb 6, 2016

Uh oh!

carsonwang commented Feb 6, 2016

Uh oh!

cloud-fan commented Feb 6, 2016

Uh oh!

SparkQA commented Feb 6, 2016

Uh oh!

rxin commented Jun 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants