-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18066] [CORE] [TESTS] Add Pool usage policies test coverage for FIFO & FAIR Schedulers #15604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @kayousterhout @markhamstra @squito, |
|
Jenkins, ok to test |
squito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @erenavsarogullari , great to have more tests. I have super trivial suggested changes, but one bigger question for other committers I'd like to wait a bit to hear back on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for more tests, but this final case here seems pretty unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: rename to "FAIR Scheduler creates a new pool when spark.scheduler.pool property points to a non-existent pool"
bigger question:
@markhamstra @kayousterhout Is this really the desired behavior? Or is there a bug -- should it fail fast? It looks like this behavior was intentional. Then again, that comment makes it sound like the user is going to directly modify the weight and minShare of the constructed pool, but though they are vars, its private [spark] and there isn't anything else exposing it to the user.
I prefer fail-fast behavior, but it seems like fair scheduler configuration prefers to assume defaults when there is misconfiguration. If you think this the right behavior, than there is probably some minor cleanup to do in Pool & SchedulableBuilder to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tracked this down and looks like it was changed back in 2013. @xiajunluan do you remember the motivation for this?
I'm torn because in general I prefer failing fast (e.g., this is better in the case where someone has a small typo they might not notice, and is intending to use a configured pool). On the other hand, I'm pretty sure I've abused this behavior in the past to avoid configuring pools when I want a bunch of equally weighted pools. If others are doing that, we shouldn't change the behavior (although I think that, in that case, we should add a big warning / future-deprecation message).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I prefer failing fast for the same reason. But I guess we can't actually change this, since there is a valid use case for this behavior. We don't really know if others are using this, and I don't think we should change it under them in the next release.
In any case, I still think we should also change Pool.weight and Pool.minShare to vals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, at least, i think we can inform the user by emphasizing that the related pool does not exist and it is created with default settings.
Our current logging just points pool creation. WDYT?
Also Pool var weight / minShare have been addressed ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: the jira doesn't really say anything other than "add these tests", so not much point in listing it here, just name test "FIFO Scheduler just uses root pool".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Also how about "FIFO scheduler uses root pool and not spark.scheduler.pool" and then adding a comment at the top of the test that says something like "spark.scheduler.pool should be ignored for the FIFO scheduler, because pools are only needed for fair scheduling"? Took me a minute to figure out what was going on here.
|
hmm, so I'm thinking about this again, along with your other pr #15237 and it got me thinking -- there also aren't any tests for a mix of FIFO and FAIR (at least, I don't see it in "Nested Pool Test" in PoolSuite). While you are doing this, would you like to add some tests for that as well? (to be perfectly honest, I need to spend a little bit of time thinking about what that behavior should be anyway, but I will have to do that a little later.) |
kayousterhout
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Also how about "FIFO scheduler uses root pool and not spark.scheduler.pool" and then adding a comment at the top of the test that says something like "spark.scheduler.pool should be ignored for the FIFO scheduler, because pools are only needed for fair scheduling"? Took me a minute to figure out what was going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you re-phrase this as what should happen, which makes these comments easier to read? e.g.
"When task sets are submitted, they should be added to the root pool, and no additional pools should be created (even though there's a configured default pool)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: +2 spaces here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit with commenting, but can you eliminate the blank lines on 219 and 221 (which don't help readability) and change this comment to something like
"Submit a new task set manager with pool properties set to null. This should result in the task set manager getting added to the default pool."
(it's helpful in test comments to be clear about what the test is doing itself, versus the expected behavior it's verifying)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above, can you eliminate the blank lines here, and change this to "When a task set manager is submitted with spark.scheduler.pool unset, it should be added to the default pool (as above)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tracked this down and looks like it was changed back in 2013. @xiajunluan do you remember the motivation for this?
I'm torn because in general I prefer failing fast (e.g., this is better in the case where someone has a small typo they might not notice, and is intending to use a configured pool). On the other hand, I'm pretty sure I've abused this behavior in the past to avoid configuring pools when I want a bunch of equally weighted pools. If others are doing that, we shouldn't change the behavior (although I think that, in that case, we should add a big warning / future-deprecation message).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clean up this comment a bit too?
"The fair scheduler should create a new pool with default values when spark.scheduler.pool points to a pool that doesn't exist yet (this can happen when the file that pools are read from isn't set, or when that file doesn't contain the pool name specified by spark.scheduler.pool)."
|
Test build #70832 has finished for PR 15604 at commit
|
|
@erenavsarogullari what's the status with this PR? I know you have at least one other fair scheduler PR outstanding, and it makes reviews easier to minimize the number of outstanding things (and helps avoid merge conflicts). |
|
Test build #72544 has finished for PR 15604 at commit
|
|
@kayousterhout thanks for query this PR. I will be submitting this and the other one( #15326) once #16813 is merged. |
|
Ok thanks for the update! |
|
Test build #72722 has finished for PR 15604 at commit
|
|
Jenkins, retest this please |
|
Hi @kayousterhout and @squito, |
|
Test build #72725 has finished for PR 15604 at commit
|
|
build failure looks unrelated (due to timeout at KafkaSourceSuite level) Jenkins, retest this please |
|
Jenkins, retest this please |
|
Test build #72736 has finished for PR 15604 at commit
|
|
Test build #72792 has finished for PR 15604 at commit
|
|
Hi @kayousterhout and @squito, |
|
Hi @kayousterhout and @squito, |
kayousterhout
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last small commeent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary to make test functions private -- they're not accessible anyway (and we don't do this elsewhere in tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JOOC does this result in a warning being printed to the user? (I know you can't test for that -- but it does seem like useful behavior to have)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this behavior is not highlighted via logs and we just points the pool creation as
Created pool: ..., schedulingMode: ..., minShare: ..., weight: ... so i agree it can be useful for the user and it is addressed with the latest commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this!
|
Hi @kayousterhout and @squito, |
|
Test build #74408 has finished for PR 15604 at commit
|
|
Test build #74410 has finished for PR 15604 at commit
|
|
Test build #74412 has finished for PR 15604 at commit
|
|
Test build #74413 has finished for PR 15604 at commit
|
|
The following unit tests look failed due to timeout and irrelevant so triggered again. 1- org.apache.spark.storage.BlockManagerProactiveReplicationSuite.proactive block replication - 5 replicas - 4 block manager deletions Error Message: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 493 times over 5.007521253999999 seconds. Last failure message: 4 did not equal 5. 2- org.apache.spark.deploy.SparkSubmitSuite.includes jars passed in through --packages Error Message: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to failAfter did not complete within 60 seconds. |
|
Jenkins, retest this please |
|
@erenavsarogullari please file a JIRA when you see test failures instead of ignoring them. I updated https://issues.apache.org/jira/browse/SPARK-19803 for the first failure, but please file a JIRA for the second one. |
|
Jenkins retest this please |
|
Test build #74575 has finished for PR 15604 at commit
|
kayousterhout
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great -- one last comment update and then I can merge this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not totally clear what the first part of this error message means. How about changing the message to:
logWarning(s"A job was submitted with scheduler pool $poolName, which has not been configured." +
"This can happen when the file that pools are read from isn't set, or when that file " +
s"doesn't contain $poolName. Created $poolName with default configuration (" +
s"schedulingMode: $DEFAULT_SCHEDULING_MODE, minShare: $DEFAULT_MINIMUM_SHARE, " +
s"weight: $DEFAULT_WEIGHT)")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it is updated ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this!
|
Many thanks @kayousterhout for reviewing this again. Last comment has also been addressed. Also the following Jira has been created for previous second UT failure. |
|
Test build #74616 has finished for PR 15604 at commit
|
|
Thanks for your work on this @erenavsarogullari. I've merged this into master. |
|
Test build #74618 has finished for PR 15604 at commit
|
…utability and access ## What changes were proposed in this pull request? Some `Schedulable` Entities(`Pool` and `TaskSetManager`) variables need refactoring for _immutability_ and _access modifiers_ levels as follows: - From `var` to `val` (if there is no requirement): This is important to support immutability as much as possible. - Sample => `Pool`: `weight`, `minShare`, `priority`, `name` and `taskSetSchedulingAlgorithm`. - Access modifiers: Specially, `var`s access needs to be restricted from other parts of codebase to prevent potential side effects. - `TaskSetManager`: `tasksSuccessful`, `totalResultSize`, `calculatedTasks` etc... This PR is related with apache#15604 and has been created seperatedly to keep patch content as isolated and to help the reviewers. ## How was this patch tested? Added new UTs and existing UT coverage. Author: erenavsarogullari <[email protected]> Closes apache#16905 from erenavsarogullari/SPARK-19567.
What changes were proposed in this pull request?
The following FIFO & FAIR Schedulers Pool usage cases need to have unit test coverage :
spark.scheduler.poolproperty is set, related pool is not created andTaskSetManagersare added to root pool.default poolwhenspark.scheduler.poolproperty is not set. This can be happened whenPropertiesobject is null,Propertiesobject is empty(new Properties()),spark.scheduler.pool=default).spark.scheduler.poolproperty points a non-existent pool. This can be happened when scheduler allocation file is not set or it does not contain related pool.How was this patch tested?
New Unit tests are added.