Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 12 additions & 11 deletions bin/spark-class
Original file line number Diff line number Diff line change
Expand Up @@ -65,24 +65,25 @@ fi
# characters that would be otherwise interpreted by the shell. Read that in a while loop, populating
# an array that will be used to exec the final command.
#
# The exit code of the launcher is appended to the output, so the parent shell removes it from the
# command array and checks the value to see if the launcher succeeded.
build_command() {
"$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"
printf "%d\0" $?
}
# To keep both the output and the exit code of the launcher, the output is first converted to a hex
# dump which prevents the bash from getting rid of the NULL character, and the exit code retrieved
# from the bash array ${PIPESTATUS[@]}.
#
# Note that the seperator NULL character can not be replace with space or '\n' so that the command
# won't fail if some path of the user contain special characher such as '\n' or space
#
# Also note that when the launcher fails, it might not output something ending with '\0' [SPARK-16586]
_CMD=$("$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"|xxd -p|tr -d '\n';exit ${PIPESTATUS[0]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the problem that the JVM may not be able to allocate a 128m heap? I don't think that's supported at all; nothing will work. I don't think we want to complicate this much just to make a better error message, but if there's a particular exit status that indicates this that you can use to report a better message, OK. I wouldn't make the rest of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I found this problem when I try to run spark for test on a login node of a HPC, which limit the amount of memory applications can use. This is not some big problem, so it's ok to mark this as "won't fix", but this might be confusing to users who don't understand shell scripts. Actually this patch is a small change and it reduce the total lines of codes (but I agree that this line is a little bit tricky and harder to understand). That's why I add some comments to explain what's happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the previous code assume that the output of the launcher end with a '\0' otherwise it will crash due to the same problem. I read the codes in org.apache.spark.launcher. It seems that the main of launcher rarely output something and exit a non-zero. So the only possibility that this is a problem is there are uncaught exceptions or there is something wrong with java.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it easier to just check the exit status of this process with $? and exit with a better message if not 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest problem to achieve this is: If success, the exit status of the process will be 0 and the output of the process will be something sperate by '\0'. Bash don't allow us to store '\0' in a bash variable, we should find some alternative way to store it, which makes $? no longer the exit status of the launcher..

Copy link
Contributor Author

@zasdfgbnm zasdfgbnm Jul 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things can also be implemented like

# line 70-73
build_command() {
  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"
  LAUNCHER_EXIT_CODE=$?
  if [ "$LAUNCHER_EXIT_CODE" != 0 ]; then printf "\0"; fi
  printf "%d\0" $LAUNCHER_EXIT_CODE
}

and

# line 83-87
CMD=("${CMD[@]:0:$LAST}")
if [ "$LAUNCHER_EXIT_CODE" != 0 ]; then
  echo "${CMD[@]}"
  exit $LAUNCHER_EXIT_CODE
fi

Which will minimize the change to the original code
If you prefer me to change the PR to do things like above, let me know, and I'd love to do that, or you can manually fix this easy bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I'm talking about something much simpler.

build_command() {
  "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"
  if [ $? != 0 ]; then 
    echo "Couldn't run launcher"
    exit $?
  fi
  printf "%d\0" $?
}

Does that make sense?

Copy link
Contributor Author

@zasdfgbnm zasdfgbnm Jul 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. build_command is executed in a subshell. The exit $? will only terminate the subshell.
You can try this on a bash shell

build_command(){   printf "aaaa\0";  exit 1; }
CMD=()
while IFS= read -d '' -r ARG; do   CMD+=("$ARG"); done < <(build_command)

You will see that the shell didn't exit, and CMD is set to aaaa

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. OK, next question. Does that really need to be in a sub shell? You mentioned it has something to do with not being able to store the null char in bash -- that makes it impossible to just store the output of this command directly in a variable?

Really I'd punt to @vanzin who will be a more informed reviewer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not find a neat solution to deal with the \0...
Another option is to replace the \0 with \n in and store it in a variable. But this will be a problem if the user's path of spark contains a \n...

LAUNCHER_EXIT_CODE=$?

CMD=()
while IFS= read -d '' -r ARG; do
CMD+=("$ARG")
done < <(build_command "$@")
done < <(echo $_CMD|xxd -r -p)

COUNT=${#CMD[@]}
LAST=$((COUNT - 1))
LAUNCHER_EXIT_CODE=${CMD[$LAST]}
if [ $LAUNCHER_EXIT_CODE != 0 ]; then
echo $_CMD|xxd -r -p|tr '\0' ' '
exit $LAUNCHER_EXIT_CODE
fi

CMD=("${CMD[@]:0:$LAST}")
exec "${CMD[@]}"