Skip to content

Conversation

@andrewor14
Copy link
Contributor

The issue is that a task may run multiple sorts, and the sorts run by the child operator (i.e. parent RDD) may acquire all available memory such that other sorts in the same task do not have enough to proceed. This manifests itself in an IOException("Unable to acquire X bytes of memory") thrown by UnsafeExternalSorter.

The solution is to reserve a page in each sorter in the chain before computing the child operator's (parent RDD's) partitions. This requires us to use a new special RDD that does some preparation before computing the parent's partitions.

Andrew Or added 4 commits August 6, 2015 12:05
The MapPartitionsWithPreparationRDDSuite simulates the condition
we are trying to fix, which is that the child can acquire memory
before the parent.
@andrewor14 andrewor14 force-pushed the unsafe-starve-memory branch from fdd3c92 to 5d5afdf Compare August 6, 2015 22:35
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used in tests

@andrewor14 andrewor14 force-pushed the unsafe-starve-memory branch from 7c34b09 to 0b07782 Compare August 6, 2015 22:38
@andrewor14
Copy link
Contributor Author

@rxin @JoshRosen

@rxin
Copy link
Contributor

rxin commented Aug 6, 2015

High level approach looks good.

Would be great to simplify the test case to not rely on memory management. Just use an atomicinteger somewhere.

@andrewor14 andrewor14 force-pushed the unsafe-starve-memory branch from db8b6e0 to 35b69a4 Compare August 6, 2015 23:39
@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40091 has finished for PR 8011 at commit 5d5afdf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Aug 7, 2015

LGTM

@andrewor14
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40093 timed out for PR 8011 at commit 0b07782 after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40103 has finished for PR 8011 at commit db8b6e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Aug 7, 2015
The issue is that a task may run multiple sorts, and the sorts run by the child operator (i.e. parent RDD) may acquire all available memory such that other sorts in the same task do not have enough to proceed. This manifests itself in an `IOException("Unable to acquire X bytes of memory")` thrown by `UnsafeExternalSorter`.

The solution is to reserve a page in each sorter in the chain before computing the child operator's (parent RDD's) partitions. This requires us to use a new special RDD that does some preparation before computing the parent's partitions.

Author: Andrew Or <[email protected]>

Closes #8011 from andrewor14/unsafe-starve-memory and squashes the following commits:

35b69a4 [Andrew Or] Simplify test
0b07782 [Andrew Or] Minor: update comments
5d5afdf [Andrew Or] Merge branch 'master' of github.com:apache/spark into unsafe-starve-memory
254032e [Andrew Or] Add tests
234acbd [Andrew Or] Reserve a page in sorter when preparing each partition
b889e08 [Andrew Or] MapPartitionsWithPreparationRDD

(cherry picked from commit 014a9f9)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 014a9f9 Aug 7, 2015
@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40107 has finished for PR 8011 at commit 35b69a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14 andrewor14 deleted the unsafe-starve-memory branch August 7, 2015 04:04
@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #1397 has finished for PR 8011 at commit 35b69a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #1396 has finished for PR 8011 at commit 35b69a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40124 timed out for PR 8011 at commit 35b69a4 after a configured wait of 175m.

asfgit pushed a commit that referenced this pull request Aug 12, 2015
This is the sister patch to #8011, but for aggregation.

In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.

rxin yhuai

Author: Andrew Or <[email protected]>

Closes #8038 from andrewor14/unsafe-starve-memory-agg.

(cherry picked from commit e011079)
Signed-off-by: Reynold Xin <[email protected]>
asfgit pushed a commit that referenced this pull request Aug 12, 2015
This is the sister patch to #8011, but for aggregation.

In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.

rxin yhuai

Author: Andrew Or <[email protected]>

Closes #8038 from andrewor14/unsafe-starve-memory-agg.
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
This is the sister patch to apache#8011, but for aggregation.

In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.

rxin yhuai

Author: Andrew Or <[email protected]>

Closes apache#8038 from andrewor14/unsafe-starve-memory-agg.
asfgit pushed a commit that referenced this pull request Oct 30, 2015
Since we do not need to preserve a page before calling compute(), MapPartitionsWithPreparationRDD is not needed anymore.

This PR basically revert #8543, #8511, #8038, #8011

Author: Davies Liu <[email protected]>

Closes #9381 from davies/remove_prepare2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants