Skip to content

Conversation

@CodingCat
Copy link
Contributor

https://issues.apache.org/jira/browse/SPARK-5337

Currently, we didn't consider spark.task.cpus when scheduling the applications in Master, so that we may fall into one of the following cases

the executor gets N cores but we need M cores to run a single task, where N < M

the executor gets N cores, we need M cores to run a single task, where N % M != 0 && N > M; so that we waste some cores in the executor

Patch for YARN is in submitted by @WangTaoTheTonic : #4123

@SparkQA
Copy link

SparkQA commented Sep 5, 2015

Test build #42037 has finished for PR 8610 at commit d3289fc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

retest this please

@CodingCat
Copy link
Contributor Author

Jenkins, retest this please.

@CodingCat
Copy link
Contributor Author

....first time to know I have the permission to retest stuffs..............

@SparkQA
Copy link

SparkQA commented Sep 5, 2015

Test build #42055 has finished for PR 8610 at commit d3289fc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 5, 2015

Test build #42054 has finished for PR 8610 at commit d3289fc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 5, 2015

Test build #42059 has finished for PR 8610 at commit 9c16dc7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 5, 2015

Test build #42060 has finished for PR 8610 at commit 0871c37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

Hi, @andrewor14 , I just added some test cases here, would you mind taking the review?

@SparkQA
Copy link

SparkQA commented Sep 7, 2015

Test build #42094 has finished for PR 8610 at commit 44c6a03.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this error message is super clear, what is meant by "no less than and folds of spark.task.cups"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holdenk how about now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearer, thanks :)

@SparkQA
Copy link

SparkQA commented Sep 9, 2015

Test build #42211 has finished for PR 8610 at commit 169141a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Sep 9, 2015

Test build #42217 has finished for PR 8610 at commit 169141a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

Jenkins, retest this please

1 similar comment
@CodingCat
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Sep 10, 2015

Test build #42229 has finished for PR 8610 at commit 169141a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

@andrewor14 would you have some chance to review this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: coreNumPerTask sounds weird. coresPerTaks is what you've used everywhere else, and the name above is coresPerExecutor.

@SparkQA
Copy link

SparkQA commented Nov 23, 2015

Test build #46536 has finished for PR 8610 at commit f27f5d8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

Jenkins, retest this please

1 similar comment
@CodingCat
Copy link
Contributor Author

Jenkins, retest this please

@CodingCat
Copy link
Contributor Author

Hi, @dragos , thanks for the comments, just sync the patch with master and addressed your comments

@andrewor14 any plan to fix this in the coming 1.6?

@andrewor14
Copy link
Contributor

The 1.6 preview is already cut. This will have to come later. Until then, it would be good for @tnachen and @dragos to have a look.

@CodingCat
Copy link
Contributor Author

OK, thx

One more question, am I supposed to have the permission to trigger Jenkins to retest the patch, I thought I was once able to do that...or it is due to the weird status of the bot?

@CodingCat
Copy link
Contributor Author

Jenkins, retest this please

@CodingCat
Copy link
Contributor Author

nvm, the bot fixed the relationship with me .....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's not clear what the relationship between coresPerTask and coresPerExecutor. Looking at the scheduling requirement we just need the maximum of the set to be able to schedule, and I thought cores per task is additional cpu resources on top of the executor.
Can we perhaps comment where this is introduced what these two are? Or point to documentation?

@SparkQA
Copy link

SparkQA commented Nov 23, 2015

Test build #46549 has finished for PR 8610 at commit f27f5d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dragos
Copy link
Contributor

dragos commented Nov 25, 2015

I'll have a look at it tomorrow, thanks for pinging me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't remove this...

@andrewor14
Copy link
Contributor

By the way, I think it's totally fine to just do this for standalone mode first, since Mesos doesn't yet read spark.executor.cores. We can fix it for Mesos later.

@SparkQA
Copy link

SparkQA commented Feb 4, 2016

Test build #50715 has finished for PR 8610 at commit db57a2e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

@andrewor14 thanks for reviewing this, I'm currently in traveling, I will move #4123 and the check here to SparkSubmit later (tomorrow)

I think we need to check whether CoresPerExecutor is no less than CoresPerTask in SparkSubmit only when the user has an explicit configuration on coresPerExecutor

The reason is that when the user does not explicitly specify coresPerExecutor (the worker will start an executor whenever there is a free core), there are still many chances that executors has more cores than coresPerTask...if we simply throw exceptions to the user in this case, the user has to set an explicit value of coresPerExecutor whenever they set a value on coresPerTask which is not an ideal situtation in terms of user-friendly, your thoughts?

@CodingCat CodingCat changed the title [SPARK-5337][Mesos][Standalone] respect spark.task.cpus when scheduling Applications [SPARK-5337][Standalone] respect spark.task.cpus when scheduling Applications Feb 4, 2016
@SparkQA
Copy link

SparkQA commented Feb 4, 2016

Test build #50719 has finished for PR 8610 at commit ae3e5e4.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

Jenkins, retest it please

@CodingCat
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Feb 4, 2016

Test build #50743 has finished for PR 8610 at commit e50e705.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50817 has finished for PR 8610 at commit 8286e97.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

Hi, @andrewor14, when I had a second think about your suggestion that we shall put parameter checking in SparkSubmit, I found that it may not be the right way to do this

Because the users are always able to set the parameters by code, instead of through spark-submit. Reading the parameters set in a programmable way happens after reading the parameters set through spark-submit. In this case, we have to check whether the parameters are making sense right before we use them to create the application descriptor, e.g. ApplicationDescription in standalone mode. Otherwise, we may miss the parameters which are set after spark-submit parameter reading...

@SparkQA
Copy link

SparkQA commented Feb 6, 2016

Test build #50847 has finished for PR 8610 at commit d644654.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Feb 6, 2016

Test build #50853 has finished for PR 8610 at commit d644654.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jun 15, 2016

Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. We can also continue the discussion on the JIRA ticket.

@asfgit asfgit closed this in 1a33f2e Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants