-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16586] Change the way the exit code of launcher is handled to avoid problem when launcher fails #14231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the problem that the JVM may not be able to allocate a 128m heap? I don't think that's supported at all; nothing will work. I don't think we want to complicate this much just to make a better error message, but if there's a particular exit status that indicates this that you can use to report a better message, OK. I wouldn't make the rest of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I found this problem when I try to run spark for test on a login node of a HPC, which limit the amount of memory applications can use. This is not some big problem, so it's ok to mark this as "won't fix", but this might be confusing to users who don't understand shell scripts. Actually this patch is a small change and it reduce the total lines of codes (but I agree that this line is a little bit tricky and harder to understand). That's why I add some comments to explain what's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the previous code assume that the output of the launcher end with a '\0' otherwise it will crash due to the same problem. I read the codes in
org.apache.spark.launcher. It seems that the main of launcher rarely output something and exit a non-zero. So the only possibility that this is a problem is there are uncaught exceptions or there is something wrong with java.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it easier to just check the exit status of this process with
$?and exit with a better message if not 0?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest problem to achieve this is: If success, the exit status of the process will be 0 and the output of the process will be something sperate by '\0'. Bash don't allow us to store '\0' in a bash variable, we should find some alternative way to store it, which makes $? no longer the exit status of the launcher..
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things can also be implemented like
and
Which will minimize the change to the original code
If you prefer me to change the PR to do things like above, let me know, and I'd love to do that, or you can manually fix this easy bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I'm talking about something much simpler.
Does that make sense?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work.
build_commandis executed in a subshell. Theexit $?will only terminate the subshell.You can try this on a bash shell
You will see that the shell didn't exit, and
CMDis set toaaaaThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha. OK, next question. Does that really need to be in a sub shell? You mentioned it has something to do with not being able to store the null char in bash -- that makes it impossible to just store the output of this command directly in a variable?
Really I'd punt to @vanzin who will be a more informed reviewer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not find a neat solution to deal with the
\0...Another option is to replace the
\0with\nin and store it in a variable. But this will be a problem if the user's path of spark contains a\n...