-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-2678][Core] Prevents spark-submit from shadowing application options
#1699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think this is a welcome change, but I don't like the compatibility-breaking aspect of it. It shouldn't be hard to maintain the current semantics (stop parsing SparkSubmit options when an unknown option is found) while still adding support for "--". It will be sub-optimal for those who run into the situation you described, but on the other hand it won't break every single existing application out there using spark-submit. |
|
Hey @liancheng why not just do an alternative fix where we just avoid shadowing user options, but we don't change the documented format. The main issue as I see it is that we aren't currently implementing the specification we give in the docs. I don't think that implies we need to change the format. |
|
@vanzin @pwendell Are you suggesting something like this: This logic accepts the following invocation happily: But we still need to rewrite the following one (which is mostly used in those scripts that delegates to into Take CLASS=org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
exec "FWDIR"/bin/spark-submit --class $CLASS spark-internal "$@"Note that # Case A
./sbin/start-thriftserver.sh --hiveconf hive.root.logger=DEBUG,console
# Case B
./sbin/start-thriftserver.sh --master local -- --hiveconf hive.root.logger=DEBUG,consoleCase A is compatible with the current documented design, while case B still require the user to add (And I haven't updated the documentation yet, will do that after we come to a final conclusion). |
|
@liancheng no, I'm suggesting keeping the current behavior, which is defined in SparkSubmitArguments.scala around line 310, starting with this: What that code does is, as soon as an unrecognized option is found, it considers everything else as app options. That means that it doesn't matter where in the command line the "primary resource" is. Examples:
Same thing. The issue you point out, and that's true, is that if Note that it doesn't require anyone except the user who wants his application to still get Basically, don't change any of the existing code; just add a case that handles "--" to stop parsing SparkSubmit options. That should be very, very simple to do. Could be as simple as adding: (Although that's obviously completely untested.) |
|
I'm worried that treating unknown args as app args would make typos difficult to debug. spark-submit --executor-croes 10 |
|
Yeah I think we should can keep full backwards compatibility with the current approach and then just also support adding |
|
Hey @liancheng I just spoke with @mateiz offline about this for a while. He had feedback on a few things which I'll summarize here. There are a couple orthogonal issues going on here. Here were his suggestions:
i.e. user programs that used or they can do So basically, when the parser arrives at an unrecognized option (which we assume to be a resource) we always treat the rest of the list as user options, even if the user options happen to have a |
|
Thanks for the great feedback, I agree, will update soon. |
|
Also, thank you @vanzin! You're right, it can be done in a downward compatible yet simple way. |
JIRA issue: SPARK-2678
Currently,
spark-submitshadows user application options. Namely, options like--helpare always captured byspark-submitand won't be passed to the user application. A negative impact of this is that, every time we add a new option tospark-submit, we may potentially break some existing user scripts if the new option name happens to shadow an existing option in the application.This PR introduced an incompatible change to fix this issue:
--is used as a separator betweenspark-submitoptions and user application options. All arguments after--are passed to the application as is. Thus,./bin/spark-submit --class Foo user.jar arg1 -arg2 --arg3 x # or ./bin/spark-submit user.jar --class Foo arg1 -arg2 --arg3 xmust be rewritten to
./bin/spark-submit --class Foo user.jar -- arg1 -arg2 --arg3 x # or ./bin/spark-submit user.jar --class Foo -- arg1 -arg2 --arg3 x@pwendell Please help review, thanks. Especially, maybe we need a vote here to decide whether fixing this issue worth an incompatible change.