Skip to content

Conversation

@erenavsarogullari
Copy link
Member

@erenavsarogullari erenavsarogullari commented Oct 23, 2016

What changes were proposed in this pull request?

The following FIFO & FAIR Schedulers Pool usage cases need to have unit test coverage :

  • FIFO Scheduler just uses root pool so even if spark.scheduler.pool property is set, related pool is not created and TaskSetManagers are added to root pool.
  • FAIR Scheduler uses default pool when spark.scheduler.pool property is not set. This can be happened when
    • Properties object is null,
    • Properties object is empty(new Properties()),
    • default pool is set(spark.scheduler.pool=default).
  • FAIR Scheduler creates a new pool with default values when spark.scheduler.pool property points a non-existent pool. This can be happened when scheduler allocation file is not set or it does not contain related pool.

How was this patch tested?

New Unit tests are added.

@erenavsarogullari erenavsarogullari changed the title Add Pool usage policies test coverage for FIFO & FAIR Schedulers [SPARK-18066] [CORE] [TESTS] Add Pool usage policies test coverage for FIFO & FAIR Schedulers Oct 23, 2016
@erenavsarogullari
Copy link
Member Author

cc @kayousterhout @markhamstra

@erenavsarogullari
Copy link
Member Author

Hi @kayousterhout @markhamstra @squito,
This PR aims to extend Pool usage policies unit test coverage and is also ready for review.
Thanks and all feedbacks are welcome in advance ;)

@squito
Copy link
Contributor

squito commented Jan 3, 2017

Jenkins, ok to test

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @erenavsarogullari , great to have more tests. I have super trivial suggested changes, but one bigger question for other committers I'd like to wait a bit to hear back on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for more tests, but this final case here seems pretty unnecessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: rename to "FAIR Scheduler creates a new pool when spark.scheduler.pool property points to a non-existent pool"

bigger question:
@markhamstra @kayousterhout Is this really the desired behavior? Or is there a bug -- should it fail fast? It looks like this behavior was intentional. Then again, that comment makes it sound like the user is going to directly modify the weight and minShare of the constructed pool, but though they are vars, its private [spark] and there isn't anything else exposing it to the user.

I prefer fail-fast behavior, but it seems like fair scheduler configuration prefers to assume defaults when there is misconfiguration. If you think this the right behavior, than there is probably some minor cleanup to do in Pool & SchedulableBuilder to clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tracked this down and looks like it was changed back in 2013. @xiajunluan do you remember the motivation for this?

I'm torn because in general I prefer failing fast (e.g., this is better in the case where someone has a small typo they might not notice, and is intending to use a configured pool). On the other hand, I'm pretty sure I've abused this behavior in the past to avoid configuring pools when I want a bunch of equally weighted pools. If others are doing that, we shouldn't change the behavior (although I think that, in that case, we should add a big warning / future-deprecation message).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I prefer failing fast for the same reason. But I guess we can't actually change this, since there is a valid use case for this behavior. We don't really know if others are using this, and I don't think we should change it under them in the next release.

In any case, I still think we should also change Pool.weight and Pool.minShare to vals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, at least, i think we can inform the user by emphasizing that the related pool does not exist and it is created with default settings.
Our current logging just points pool creation. WDYT?

Also Pool var weight / minShare have been addressed ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: the jira doesn't really say anything other than "add these tests", so not much point in listing it here, just name test "FIFO Scheduler just uses root pool".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Also how about "FIFO scheduler uses root pool and not spark.scheduler.pool" and then adding a comment at the top of the test that says something like "spark.scheduler.pool should be ignored for the FIFO scheduler, because pools are only needed for fair scheduling"? Took me a minute to figure out what was going on here.

@squito
Copy link
Contributor

squito commented Jan 3, 2017

hmm, so I'm thinking about this again, along with your other pr #15237

and it got me thinking -- there also aren't any tests for a mix of FIFO and FAIR (at least, I don't see it in "Nested Pool Test" in PoolSuite). While you are doing this, would you like to add some tests for that as well? (to be perfectly honest, I need to spend a little bit of time thinking about what that behavior should be anyway, but I will have to do that a little later.)

Copy link
Contributor

@kayousterhout kayousterhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these tests!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Also how about "FIFO scheduler uses root pool and not spark.scheduler.pool" and then adding a comment at the top of the test that says something like "spark.scheduler.pool should be ignored for the FIFO scheduler, because pools are only needed for fair scheduling"? Took me a minute to figure out what was going on here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you re-phrase this as what should happen, which makes these comments easier to read? e.g.

"When task sets are submitted, they should be added to the root pool, and no additional pools should be created (even though there's a configured default pool)."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: +2 spaces here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit with commenting, but can you eliminate the blank lines on 219 and 221 (which don't help readability) and change this comment to something like

"Submit a new task set manager with pool properties set to null. This should result in the task set manager getting added to the default pool."

(it's helpful in test comments to be clear about what the test is doing itself, versus the expected behavior it's verifying)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the above, can you eliminate the blank lines here, and change this to "When a task set manager is submitted with spark.scheduler.pool unset, it should be added to the default pool (as above)."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tracked this down and looks like it was changed back in 2013. @xiajunluan do you remember the motivation for this?

I'm torn because in general I prefer failing fast (e.g., this is better in the case where someone has a small typo they might not notice, and is intending to use a configured pool). On the other hand, I'm pretty sure I've abused this behavior in the past to avoid configuring pools when I want a bunch of equally weighted pools. If others are doing that, we shouldn't change the behavior (although I think that, in that case, we should add a big warning / future-deprecation message).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clean up this comment a bit too?

"The fair scheduler should create a new pool with default values when spark.scheduler.pool points to a pool that doesn't exist yet (this can happen when the file that pools are read from isn't set, or when that file doesn't contain the pool name specified by spark.scheduler.pool)."

@SparkQA
Copy link

SparkQA commented Jan 4, 2017

Test build #70832 has finished for PR 15604 at commit f630cff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kayousterhout
Copy link
Contributor

@erenavsarogullari what's the status with this PR? I know you have at least one other fair scheduler PR outstanding, and it makes reviews easier to minimize the number of outstanding things (and helps avoid merge conflicts).

@SparkQA
Copy link

SparkQA commented Feb 7, 2017

Test build #72544 has finished for PR 15604 at commit f1cfc9e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

@kayousterhout thanks for query this PR. I will be submitting this and the other one( #15326) once #16813 is merged.

@kayousterhout
Copy link
Contributor

Ok thanks for the update!

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72722 has finished for PR 15604 at commit 84513b1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

Jenkins, retest this please

@erenavsarogullari
Copy link
Member Author

Hi @kayousterhout and @squito,
Firstly, many thanks for the review. All comments are addressed. This is ready to re-review ;)

@SparkQA
Copy link

SparkQA commented Feb 11, 2017

Test build #72725 has finished for PR 15604 at commit f84abe7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

build failure looks unrelated (due to timeout at KafkaSourceSuite level)

Jenkins, retest this please

@erenavsarogullari
Copy link
Member Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Feb 11, 2017

Test build #72736 has finished for PR 15604 at commit 9eafad5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2017

Test build #72792 has finished for PR 15604 at commit d68bb0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

Hi @kayousterhout and @squito,
This PR is ready to re-review / merge if it is possible.
All feedbacks are welcome and thanks in advance ;)

@erenavsarogullari
Copy link
Member Author

Hi @kayousterhout and @squito,
This PR is ready to re-review.
All feedbacks are welcome and thanks in advance ;)

Copy link
Contributor

@kayousterhout kayousterhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last small commeent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary to make test functions private -- they're not accessible anyway (and we don't do this elsewhere in tests)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JOOC does this result in a warning being printed to the user? (I know you can't test for that -- but it does seem like useful behavior to have)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this behavior is not highlighted via logs and we just points the pool creation as
Created pool: ..., schedulingMode: ..., minShare: ..., weight: ... so i agree it can be useful for the user and it is addressed with the latest commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this!

@erenavsarogullari
Copy link
Member Author

Hi @kayousterhout and @squito,
Last comments are addressed so this PR is ready to re-review / merge if it is possible.
Thanks in advance ;)

@SparkQA
Copy link

SparkQA commented Mar 12, 2017

Test build #74408 has finished for PR 15604 at commit 4793ea8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 12, 2017

Test build #74410 has finished for PR 15604 at commit e9bb2b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 12, 2017

Test build #74412 has finished for PR 15604 at commit 91f3814.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 12, 2017

Test build #74413 has finished for PR 15604 at commit 17e11f0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

The following unit tests look failed due to timeout and irrelevant so triggered again.

1- org.apache.spark.storage.BlockManagerProactiveReplicationSuite.proactive block replication - 5 replicas - 4 block manager deletions

Error Message: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 493 times over 5.007521253999999 seconds. Last failure message: 4 did not equal 5.

2- org.apache.spark.deploy.SparkSubmitSuite.includes jars passed in through --packages

Error Message: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to failAfter did not complete within 60 seconds.

@erenavsarogullari
Copy link
Member Author

Jenkins, retest this please

@kayousterhout
Copy link
Contributor

@erenavsarogullari please file a JIRA when you see test failures instead of ignoring them. I updated https://issues.apache.org/jira/browse/SPARK-19803 for the first failure, but please file a JIRA for the second one.

@kayousterhout
Copy link
Contributor

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Mar 15, 2017

Test build #74575 has finished for PR 15604 at commit 17e11f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@kayousterhout kayousterhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -- one last comment update and then I can merge this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not totally clear what the first part of this error message means. How about changing the message to:

logWarning(s"A job was submitted with scheduler pool $poolName, which has not been configured." +
    "This can happen when the file that pools are read from isn't set, or when that file " +
    s"doesn't contain $poolName.  Created $poolName with default configuration (" +
    s"schedulingMode: $DEFAULT_SCHEDULING_MODE, minShare: $DEFAULT_MINIMUM_SHARE, " +
    s"weight: $DEFAULT_WEIGHT)")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it is updated ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this!

@erenavsarogullari
Copy link
Member Author

Many thanks @kayousterhout for reviewing this again. Last comment has also been addressed.

Also the following Jira has been created for previous second UT failure.
https://issues.apache.org/jira/browse/SPARK-19964

@SparkQA
Copy link

SparkQA commented Mar 15, 2017

Test build #74616 has finished for PR 15604 at commit f13ad3d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kayousterhout
Copy link
Contributor

Thanks for your work on this @erenavsarogullari. I've merged this into master.

@asfgit asfgit closed this in 046b8d4 Mar 15, 2017
@SparkQA
Copy link

SparkQA commented Mar 15, 2017

Test build #74618 has finished for PR 15604 at commit b7f2629.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

ghost pushed a commit to dbtsai/spark that referenced this pull request Mar 24, 2017
…utability and access

## What changes were proposed in this pull request?
Some `Schedulable` Entities(`Pool` and `TaskSetManager`) variables need refactoring for _immutability_ and _access modifiers_ levels as follows:
- From `var` to `val` (if there is no requirement): This is important to support immutability as much as possible.
  - Sample => `Pool`: `weight`, `minShare`, `priority`, `name` and `taskSetSchedulingAlgorithm`.
- Access modifiers: Specially, `var`s access needs to be restricted from other parts of codebase to prevent potential side effects.
  - `TaskSetManager`: `tasksSuccessful`, `totalResultSize`, `calculatedTasks` etc...

This PR is related with apache#15604 and has been created seperatedly to keep patch content as isolated and to help the reviewers.

## How was this patch tested?
Added new UTs and existing UT coverage.

Author: erenavsarogullari <[email protected]>

Closes apache#16905 from erenavsarogullari/SPARK-19567.
@erenavsarogullari erenavsarogullari deleted the SPARK-18066 branch March 25, 2017 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants