Skip to content

Conversation

@aarondav
Copy link
Contributor

In general, individual shuffle blocks are frequently small, so mmapping them often creates a lot of waste. It may not be bad to mmap the larger ones, but it is pretty inconvenient to get configuration into ManagedBuffer, and besides it is unlikely to help all that much.

…ionManager

In general, individual shuffle blocks are frequently small, so mmapping them
often creates a lot of waste. It may not be bad to mmap the larger ones, but
it is pretty inconvenient to get configuration into ManagedBuffer, and besides
it is unlikely to help all that much.

Note that user of ManagedBuffer#nioByteBuffer() seems generally bad practice,
and would ideally never be used for data that may be large. Users of such data
would ideally stream the data instead.
@aarondav
Copy link
Contributor Author

@rxin thoughts on this? You probably have a better idea on whether this would be too much of a perf hit, and whether we should switch into "map" mode for larger blocks.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21562/Test FAILed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the old code had a config parameter for using mem map or this ..

@aarondav
Copy link
Contributor Author

Added a non-configurable version of the memory map pathway, with the threshold you suggested (2MB, the size of a hugepage). Note that this fix will also be included in #2753.

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have started for PR 2742 at commit a152065.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 10, 2014

QA tests have finished for PR 2742 at commit a152065.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21583/Test PASSed.

@rxin
Copy link
Contributor

rxin commented Oct 10, 2014

LGTM. Merged. Thanks!

@asfgit asfgit closed this in 90f73fc Oct 10, 2014
@mridulm
Copy link
Contributor

mridulm commented Oct 10, 2014

This needs to be configurable ... IIRC 1.1 had this customizable - see spark.storage.memoryMapThreshold
Different limits exist for vm vs heap memory in yarn (for example) and so deployments will need to customize this.

@aarondav
Copy link
Contributor Author

@mridulm Could you give an example of which way you would want to shift it via config? Map more or less often?

@mridulm
Copy link
Contributor

mridulm commented Oct 10, 2014

With 1.1, in expts, we have done both : depending on whether our user code
is mmap'ing too much data (and so we pull things into heap .. using
libraries not in our control :-) ); decreasing it when heap is at premium.
On 11-Oct-2014 4:42 am, "Aaron Davidson" [email protected] wrote:

@mridulm https://github.com/mridulm Could you give an example of which
way you would want to shift it via config? Map more or less often?


Reply to this email directly or view it on GitHub
#2742 (comment).

@mridulm
Copy link
Contributor

mridulm commented Oct 10, 2014

Note: this is reqd since there are heap and vm limits enforced, so we
juggle available memory around so that jobs can run to completion!
On 11-Oct-2014 4:56 am, "Mridul Muralidharan" [email protected] wrote:

With 1.1, in expts, we have done both : depending on whether our user code
is mmap'ing too much data (and so we pull things into heap .. using
libraries not in our control :-) ); decreasing it when heap is at premium.
On 11-Oct-2014 4:42 am, "Aaron Davidson" [email protected] wrote:

@mridulm https://github.com/mridulm Could you give an example of which
way you would want to shift it via config? Map more or less often?


Reply to this email directly or view it on GitHub
#2742 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants