[SPARK-27192][Core] spark.task.cpus should be less or equal than spark.executor.cores #24131

liutang123 · 2019-03-18T14:57:02Z

What changes were proposed in this pull request?

spark.task.cpus should be less or equal than spark.executor.cores when use static executor allocation

How was this patch tested?

manual

AmplabJenkins · 2019-03-18T16:53:01Z

Can one of the admins verify this patch?

core/src/main/scala/org/apache/spark/SparkContext.scala

srowen · 2019-03-19T19:08:24Z

core/src/main/scala/org/apache/spark/SparkContext.scala

Because when run local mode, just 1 core is available.

scala>sc.setLogLevel("INFO") scala>sc.parallelize(1 to 9).collect

You can see spark will hang after log INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks.

I see, local == local[1]

srowen · 2019-03-19T19:10:37Z

core/src/main/scala/org/apache/spark/SparkContext.scala

Here and below I don't think "local[...]" adds much beyond threadCount in the message

Please pardon me as my English isn't very good.
I do not understand this comment.

You already report threadCount in the message; local[threads] doesn't add information. It can be removed

Sometimes threadCount is not the same to threads such as local[*]

srowen · 2019-03-20T14:37:17Z

core/src/main/scala/org/apache/spark/SparkContext.scala

Can you just modify this little utility method to take a "cores" parameter and then use it in all the cases below? it can default to sc.conf.get(EXECUTOR_CORES), and then below you can set it to 1 for the local case, for example.

jiangxb1987 · 2019-03-20T16:04:49Z

Have you noticed

spark/core/src/main/scala/org/apache/spark/SparkConf.scala

Lines 580 to 588 in 4808393

    
           if (contains(EXECUTOR_CORES) && contains(CPUS_PER_TASK)) { 
        
             val executorCores = get(EXECUTOR_CORES) 
        
             val taskCpus = get(CPUS_PER_TASK) 
        
             if (executorCores < taskCpus) { 
        
               throw new SparkException( 
        
                 s"${EXECUTOR_CORES.key} must not be less than ${CPUS_PER_TASK.key}.") 
        
             } 
        
           }

? Why do we still want to check in SparkContext?

liutang123 · 2019-03-21T07:19:37Z

@jiangxb1987 Thanks for review.
Sorry I didn't noticed the checking logic in SparkConf, but I think the checking logic is incomplete for local mode.
For example:
case 1:

$SPARK_HOME/bin/spark-shell --master local[3] --conf spark.task.cpus=2 --conf spark.executor.cores=1

local[3] decides executor's core num is 3, but in #23290's logic, exception will be thrown.
case 2:

$SPARK_HOME/bin/spark-shell  --master local  --conf spark.task.cpus=2 --conf spark.executor.cores=3
scala>sc.setLogLevel("INFO")
scala>sc.parallelize(1 to 9).collect

You can see spark will hang after log INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks. but the checking logic in #23290 can not identify this case.
So, I think we can check the spark.task.cpus before creating TaskScheduler.

srowen · 2019-03-21T11:48:28Z

core/src/main/scala/org/apache/spark/SparkContext.scala

I don't think you need two messages here. Just have one stating that ${CPUS_PER_TASK.key} must be <= the $executorCoreNum cores available per executor

executorCoreNum is decided by local[N] or --conf spark.executor.cores=M, I think we should info this in exception.
And, if user both sets --master local[N] and --conf spark.executor.cores=M, I think we should ignore the latter.

srowen · 2019-03-21T11:48:54Z

core/src/test/scala/org/apache/spark/SparkConfSuite.scala

Is it possible to at least add back this test, for SparkContext?

@liutang123 let's address this or close it; some of this is already checked

Hi, @srowen I add a UT in SparkContextSuite witch is more reasonable than this I think.
If you think #23290 is necessary to be retained, this PR only requires checking the local mode.
But, imagine this case:
spark.executor.cores 1 in conf/spark-defaults.conf as default conf.
If user exec spark-shell --master local[6] --conf spark.task.cpus=2 command, the checking logic in #23290 will throw an exception. This logic forces the user to set spark.executor.cores larger than 2 although the spark.executor.cores is meaningless. So, I think we can check spark.task.cpus before creating SchedulerBackend and TaskScheduler - but what do you think？

srowen · 2019-03-28T16:59:40Z

core/src/test/scala/org/apache/spark/SparkContextSuite.scala

Let's flip this around and have a test method that iterates over the possibilities

srowen · 2019-03-28T17:00:05Z

core/src/test/scala/org/apache/spark/SparkContextSuite.scala

Break the body onto a new line. Also, can you assert that the message contains a short substring that indicates the error is the expected one?

…k.task.cpus when use static executor allocation.

srowen · 2019-03-30T17:38:40Z

Merged to master

dongjoon-hyun · 2019-03-30T23:35:02Z

Hi, All.
This doesn't pass the Jenkins. And, currently this breaks the master branch due to UT failures. I'll revert this.

srowen · 2019-03-30T23:43:21Z

Ugh another error. I think I looked at the wrong PR when checking whether it passed. I'll revert as needed

dongjoon-hyun · 2019-03-30T23:44:36Z

Oh, I reverted it already~

dongjoon-hyun · 2019-03-30T23:46:13Z

@liutang123 . Could you check the following tests and make another PR please?

org.apache.spark.BarrierStageOnSubmittedSuite.submit a barrier ResultStage that requires more slots than current total under local-cluster mode
org.apache.spark.BarrierStageOnSubmittedSuite.submit a barrier ShuffleMapStage that requires more slots than current total under local-cluster mode
org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.compute max number of concurrent tasks can be launched when spark.task.cpus > 1
org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.compute max number of concurrent tasks can be launched when some executors are busy
org.apache.spark.scheduler.TaskSchedulerImplSuite.Scheduler correctly accounts for multiple CPUs per task
org.apache.spark.scheduler.TaskSchedulerImplSuite.Scheduler does not crash when tasks are not serializable
org.apache.spark.scheduler.TaskSchedulerImplSuite.don't schedule for a barrier taskSet if available slots are less than pending tasks
org.apache.spark.scheduler.TaskSchedulerImplSuite.schedule tasks for a barrier taskSet if all tasks can be launched together

srowen requested changes Mar 19, 2019

View reviewed changes

liutang123 changed the title ~~[SPARK-27192][Core] spark.task.cpus should be less or equal than spar…~~ [SPARK-27192][Core] spark.task.cpus should be less or equal than spark.executor.cores Mar 20, 2019

srowen reviewed Mar 20, 2019

View reviewed changes

srowen requested changes Mar 21, 2019

View reviewed changes

srowen reviewed Mar 28, 2019

View reviewed changes

liutang123 added 3 commits March 29, 2019 16:00

[SPARK-27192][Core] spark.task.cpus should be less or equal than spar…

6e6061f

…k.task.cpus when use static executor allocation.

remove SPARK-26340's check.

7602e04

UT added

cc2befb

liutang123 force-pushed the SPARK-27192 branch from 82fc3cb to cc2befb Compare March 29, 2019 08:27

srowen approved these changes Mar 29, 2019

View reviewed changes

srowen closed this in f8fa564 Mar 30, 2019

liutang123 mentioned this pull request Apr 3, 2019

[SPARK-27192][Core] spark.task.cpus should be less or equal than spark.executor.cores #24261

Closed

[SPARK-27192][Core] spark.task.cpus should be less or equal than spark.executor.cores #24131

[SPARK-27192][Core] spark.task.cpus should be less or equal than spark.executor.cores #24131

Uh oh!

Conversation

liutang123 commented Mar 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Mar 18, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liutang123 Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiangxb1987 commented Mar 20, 2019

Uh oh!

liutang123 commented Mar 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liutang123 Mar 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Mar 30, 2019

Uh oh!

dongjoon-hyun commented Mar 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Mar 30, 2019

Uh oh!

dongjoon-hyun commented Mar 30, 2019

Uh oh!

dongjoon-hyun commented Mar 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

liutang123 commented Mar 18, 2019 •

edited

Loading

liutang123 Mar 20, 2019 •

edited

Loading

liutang123 commented Mar 21, 2019 •

edited

Loading

liutang123 Mar 28, 2019 •

edited

Loading

dongjoon-hyun commented Mar 30, 2019 •

edited

Loading