-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Add the validation of spark.cores.max under Streaming #18948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
I don't think that's valid. It doesn't catch all cases where you need > 1 core, and isn't necessary for cases that don't involve receivers, I think. |
|
To example NetworkWordCount, if run with '--total-executor-cores 1', it can' be success. |
|
That's not the problem -- I'm saying there are cases where 1 is valid. There are obviously cases where it isn't. But your change makes them all fail. |
|
Ok, sorry, I'ill review it, thanks. |
| } | ||
|
|
||
| if (sc.conf.contains("spark.cores.max")) { | ||
| val totalCores = sc.conf.getInt("spark.cores.max", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, "spark.cores.max" is not set by default, with your change it will always return 1 if not set and throws exception as you wrote.
Besides if there're several receivers in streaming application, even > 1 is not sufficient. So simply checking "spark.cores.max" is not a feasible way to address the issue, and there may not be a good way to handle this problem properly, so I'd suggest to leave the current code as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config spark.cores.max is used to limit the max number of cores that a single executor can require, and as @jerryshao pointed out, it's not set by default. I'm not convinced that validating the config can benefit your issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryshao spark.cores.max will be returned only when conf contain it.
Here is the biggest possibility to make a judgment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SOmeONee I'm not quite following your comment here. You assume that default value is "1" which changes the semantics of this configuration. What's more, checking whether "<=1" is not so sufficient as I mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangxb1987 "spark.cores.max" is per application configuration to limit the numbers of cores can be requested for this application, it is not a per executor limitation.
The config spark.cores.max is used to limit the max number of cores that a single executor can require
So still if we have 2 receivers in one streaming application, the minimum number should > 2, checking "1" here is still not feasible.
Since receiver number can only be gotten in run-time, checking configuration will not be worked as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, seems the validation here is not very useful.
|
@srowen I reviewed again and run the tests, there are no test cases with spark.cores.max=1 in streaming |
|
The patch here is not solid, we will not merge it unless you have better solution. |
Closes apache#18522 Closes apache#17722 Closes apache#18879 Closes apache#18891 Closes apache#18806 Closes apache#18948 Closes apache#18949 Closes apache#19070 Closes apache#19039 Closes apache#19142 Closes apache#18515 Closes apache#19154 Closes apache#19162 Closes apache#19187
Closes apache#18522 Closes apache#17722 Closes apache#18879 Closes apache#18891 Closes apache#18806 Closes apache#18948 Closes apache#18949 Closes apache#19070 Closes apache#19039 Closes apache#19142 Closes apache#18515 Closes apache#19154 Closes apache#19162 Closes apache#19187 Closes apache#19091 Author: Sean Owen <[email protected]> Closes apache#19203 from srowen/CloseStalePRs3.
What changes were proposed in this pull request?
By using spark streaming, --total-executor-cores must greater than 1.
This pull requst add the validation of spark.cores.max.
How was this patch tested?
manual tests:
--total-executor-cores 1
--total-executor-cores 2
Please review http://spark.apache.org/contributing.html before opening a pull request.