-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Modify default YARN memory_overhead-- from an additive constant to a multiplier #1391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…extFiles The prefix "file:" is missing in the string inserted as key in HashMap
…onsistent with rest of Spark
|
Can one of the admins verify this patch? |
|
We have gone over this in the past .. it is suboptimal to make it a linear
|
|
That makes sense, but then it doesn't explain why a constant amount works for a given job when executor memory is low, and then doesn't work when it is high. This has also been my experience and I don't have a great grasp on why it would be. More threads and open files in a busy executor? It goes indirectly with how big you need your executor to be, but not directly. Nishkam do you have a sense of how much extra memory you had to configure to get it to work when executor memory increased? is it pretty marginal, or quite substantial? |
|
Yes, I'm aware of the discussion on this issue in the past. Experiments confirm that overhead is a function of executor memory. Why and how can be figured out with due diligence and analysis. It may be a function of other parameters and the function may be fairly complex. However, the proportionality is undeniable. Besides, we are only adjusting the default value and making it a bit more resilient. The memory_overhead parameter can still be configured by the developer separately. The constant additive factor makes little sense (empirically). |
|
Sean, the memory_overhead is fairly substantial. More than 2GB for a 30GB executor. Less than 400MB for a 2GB executor. |
|
The default constant is actually a lowerbound to account for other This is compounded by magic constants in spark for various IO ops, non Hence sizing this is, unfortunately, app specific.
|
|
That would be a function of your jobs.
|
|
The basic issue is you are trying to model overhead using the wrong
|
|
Yes of course, lots of settings' best or even usable values are ultimately app-specific. Ideally, defaults work for lots of cases. A flat value is the simplest of models, and anecdotally, the current default value does not work in medium- to large-memory YARN jobs. You can increase the default, but then the overhead gets silly for small jobs -- 1GB? And all of these are not-uncommon use cases. None of that implies the overhead logically scales with container memory. Empirically, it may do, and that's useful. Until the magic explanatory variable is found, which one is less problematic for end users -- a flat constant that frequently has to be tuned, or an imperfect model that could get it right in more cases? That said it is kind of a developer API change and feels like something to not keep reimagining. Niskham can you share any anecdotal evidence about how the overhead changes. If executor memory is the only variable changing, that seems to be evidence against it being driven by other factors. but I don't know if that's what we know. |
|
You are lucky :-) for some of our jobs, in a 8gb container, overhead is
|
|
Experimented with three different workloads and noticed common patterns of proportionality. |
|
That's why the parameter is configurable. If you have jobs that cause 20-25% memory_overhead, default values will not help. |
|
You are missing my point I think ... To give unscientific anecdotal example In an effort to ensure jobs always run to completion, setting overhead to I would like a good default estimate of overhead ... But that is not
|
|
Mridul, I think you are missing the point. We understand that this parameter will in a lot of cases have to be specified by the developer, since there is no easy way to model it (that's why we are retaining it as a configurable parameter). However, the question is what would be a good default value be. "I would like a good default estimate of overhead ... But that is not You are mistaken. It may not be a directly correlated variable, but it is most certainly indirectly correlated. And it is probably correlated to other app-specific parameters as well. "Until the magic explanatory variable is found, which one is less problematic for end users -- a flat constant that frequently has to be tuned, or an imperfect model that could get it right in more cases?" This is the right point of view. |
|
On Jul 13, 2014 3:16 PM, "nishkamravi2" [email protected] wrote:
It does not help to estimate using the wrong variable. The only actual correlation between executor memory and overhead is java vm
Please see above.
Which has been our view even in previous discussions :-) Note that this estimation would be very volatile to spark internals
|
|
correction: (and that is NOT very high as a fraction). Tying on phones can suck :-) To add to Sean's point: we definitely need to estimate this better. |
|
Hmm, looks like some of my responses to Sean via mail reply have not shown up here ... Maybe mail gateway delays ? |
|
Since this is a recurring nightmare for our users, let me try to list down If it so happens that end of the exercise it is linear function of memory,
|
|
Bringing the discussion back online. Thanks for all the input so far. Ran a few experiments yday and today. Number of executors (which was the other main handle we wanted to factor in) doesn't seem to have any noticeable impact. Tried a few other parameters such as num_partitions, default_parallelism but nothing sticks. Confirmed the proportionality with container size. Have also been trying to tune the multiplier to minimize potential waste and I think 6% (as opposed to 7% as we currently have) is the lowest we should go. Modifying the PR accordingly. |
|
I'll let mridul comment on this but I think adding a comment where 0.06 came from would be useful. |
|
6% was experimentally obtained (with the goal of keeping the bound as tight as possible without the containers crashing). Three workloads were experimented with: PageRank, WordCount and KMeans over moderate to large input datasets and configured such that the containers are optimally utilized (neither under-utilized nor over-subscribed). Based on my observations, less than 5% is a no-no. If someone would like to tune this parameter more and make a case for a higher value (keeping in mind that this is a default value that will not cover all workloads), that would be helpful. |
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line too long, here and other places
|
What is the current state of this PR? @tgravescs @mridulm any more thoughts about the current approach? This is a related PR for mesos (#2401) and I'm wondering if we can use the same approach in both places. |
|
Updated as per @andrewor14 's comments |
|
These changes look good to me. This addresses what continues to be the #1 issue that we see in Cloudera customer YARN deployments. It's worth considering boosting this when using PySpark, but that's probably work for another JIRA. |
|
@nishkamravi2 mind resolving the merge conflicts? |
|
@sryza Thanks Sandy. Will do. |
|
@mridulm any comments? I'm ok with it if its a consistent problem for users. One thing we definitely need to do is document it and possibly look at including better log and error messages. We should at least log the size of the overhead it calculates. It would also be nice to log what it is when we fail to get a container large enough or it fails due to the cluster max allocation limit was hit. |
|
Have redone the PR against the recent master branch, which has undergone significant structural changes for Yarn. Addressed review comments and changed the multiplier back to 0.07 (to err on the conservative side, since customers are running into this issue). |
|
If #2485 is the replacement, can we close this one out? |
|
Shall we let this linger on for just a bit until the other one gets merged? |
|
Noticed that we have a reference to this one in 2485, closing it out. |
…multiplier Redone against the recent master branch (#1391) Author: Nishkam Ravi <[email protected]> Author: nravi <[email protected]> Author: nishkamravi2 <[email protected]> Closes #2485 from nishkamravi2/master_nravi and squashes the following commits: 636a9ff [nishkamravi2] Update YarnAllocator.scala 8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead 35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead 5ac2ec1 [Nishkam Ravi] Remove out dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue 42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue 362da5e [Nishkam Ravi] Additional changes for yarn memory overhead c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead 1cf2d1e [nishkamravi2] Update YarnAllocator.scala ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts) 2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark 2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark 3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark 5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456) 6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed) 5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456) 681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles
Related to #894 and https://issues.apache.org/jira/browse/SPARK-2398 Experiments show that memory_overhead grows with container size. The multiplier has been experimentally obtained and can potentially be improved over time.