Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ private[spark] class SparkSubmit extends Logging {

if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
if (args.isPython) {
if (args.isPython || isInternal(args.primaryResource)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will be dumb questions since I don't know this code, but is this specific just to Python? This seems to set pyFiles even when not running a Python job or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code snippet in the if() statement is to properly set "spark.submit.pyFiles" config, but it only get executed when args.isPython. SparkSubmit uses the primary resource's suffix to determine args.isPython. For NO_RESOURCE type ("spark-internal" as primary resource), we should also do this because it might use python in the NO_RESOURCE scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I get what the code does, was just wondering why it always sets a pyfiles now even when it's not a pyspark app. But the answer is that pyspark apps also need resolved Maven dependencies, I believe. @vanzin does this look right?

args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
}
}
Expand Down