-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27192][Core] spark.task.cpus should be less or equal than spark.executor.cores #24131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Why not?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because when run local mode, just 1 core is available.
scala>sc.setLogLevel("INFO")
scala>sc.parallelize(1 to 9).collect
You can see spark will hang after log INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, local == local[1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below I don't think "local[...]" adds much beyond threadCount in the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please pardon me as my English isn't very good.
I do not understand this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already report threadCount in the message; local[threads] doesn't add information. It can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes threadCount is not the same to threads such as local[*]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just modify this little utility method to take a "cores" parameter and then use it in all the cases below? it can default to sc.conf.get(EXECUTOR_CORES), and then below you can set it to 1 for the local case, for example.
|
Have you noticed spark/core/src/main/scala/org/apache/spark/SparkConf.scala Lines 580 to 588 in 4808393
|
|
@jiangxb1987 Thanks for review.
You can see spark will hang after log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need two messages here. Just have one stating that ${CPUS_PER_TASK.key} must be <= the $executorCoreNum cores available per executor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executorCoreNum is decided by local[N] or --conf spark.executor.cores=M, I think we should info this in exception.
And, if user both sets --master local[N] and --conf spark.executor.cores=M, I think we should ignore the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to at least add back this test, for SparkContext?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liutang123 let's address this or close it; some of this is already checked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @srowen I add a UT in SparkContextSuite witch is more reasonable than this I think.
If you think #23290 is necessary to be retained, this PR only requires checking the local mode.
But, imagine this case:
spark.executor.cores 1 in conf/spark-defaults.conf as default conf.
If user exec spark-shell --master local[6] --conf spark.task.cpus=2 command, the checking logic in #23290 will throw an exception. This logic forces the user to set spark.executor.cores larger than 2 although the spark.executor.cores is meaningless. So, I think we can check spark.task.cpus before creating SchedulerBackend and TaskScheduler - but what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's flip this around and have a test method that iterates over the possibilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break the body onto a new line. Also, can you assert that the message contains a short substring that indicates the error is the expected one?
…k.task.cpus when use static executor allocation.
|
Merged to master |
|
Hi, All. |
|
Ugh another error. I think I looked at the wrong PR when checking whether it passed. I'll revert as needed |
|
Oh, I reverted it already~ |
|
@liutang123 . Could you check the following tests and make another PR please?
|
What changes were proposed in this pull request?
spark.task.cpus should be less or equal than spark.executor.cores when use static executor allocation
How was this patch tested?
manual