Skip to content

Conversation

@pgandhi999
Copy link

…rEnv properly

Running in yarn cluster mode and trying to set pythonpath via spark.yarn.appMasterEnv.PYTHONPATH doesn't work.

the yarn Client code looks at the env variables:
val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
But when you set spark.yarn.appMasterEnv it puts it into the local env.

So the python path set in spark.yarn.appMasterEnv isn't properly set.

You can work around if you are running in cluster mode by setting it on the client like:

PYTHONPATH=./addon/python/ spark-submit

What changes were proposed in this pull request?

In Client.scala, PYTHONPATH was being overridden, so changed code to append values to PYTHONPATH instead of overriding them.

How was this patch tested?

Added log statements to ApplicationMaster.scala to check for environment variable PYTHONPATH, ran a spark job in cluster mode before the change and verified the issue. Performed the same test after the change and verified the fix.

…rEnv properly

Running in yarn cluster mode and trying to set pythonpath via spark.yarn.appMasterEnv.PYTHONPATH doesn't work.

the yarn Client code looks at the env variables:
val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
But when you set spark.yarn.appMasterEnv it puts it into the local env.

So the python path set in spark.yarn.appMasterEnv isn't properly set.

You can work around if you are running in cluster mode by setting it on the client like:

PYTHONPATH=./addon/python/ spark-submit

In Client.scala, PYTHONPATH was being overridden, so changed code to append values to PYTHONPATH
@tgravescs
Copy link
Contributor

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented May 31, 2018

Test build #91345 has finished for PR 21468 at commit 0aee8fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@redsanket
Copy link

LGTM @pgandhi999 Hope @tgravescs can confirm it

@pgandhi999
Copy link
Author

Thank you for your feedback, @redsanket

.mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
env("PYTHONPATH") = pythonPathStr
sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
val newValue =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just say env.get("PYTHONPATH") ++=: pythonPath before turning the list into a string.

But there's also two extra questions here:

  • precedence; should the env come before or after the files added with py-files? I kinda think after makes more sense, since files are generally provided in the command line.
  • should appMasterEnv be reflected in executors? With your code it is. I'm not so sure it should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good questions

  • precedence: So right now you can work around this issue by exporting PYTHONPATH before you launch spark-submit, I think this is something that could just be in someone's env on the launcher box and might not be what you want in a yarn container. I would think that specifying explicit pythonpath via spark.yarn.appMasterEnv would take precedence over that since you explicitly configured. Now the second question is where that fails with the py-files and that one isn't as clear to me since like you said its explicitly specified. Maybe we do py-files then spark.yarn.appMasterEnv.PYTHONPATH and then last env PYTHONPATH. that is different from the way it is now though. thoughts?

  • agree this should not be reflected in the executors so if it is we shouldn't do that. We should make sure the spark. executorEnv.PYTHONPATH works

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments @vanzin , have made the necessary changes. As far as precedence is concerned, I am still not sure whether I understood your question at first, however @tgravescs clarified it for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note as @vanzin said you can just use the ++=: operator with the listbuffer to prepend and get rid of the if conditions before converting to string.
env.get("PYTHONPATH") ++=: (sys.env.get("PYTHONPATH") ++=: pythonPath)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgravescs Have replaced the if-else code with ++ operator.

- spark.executorEnv should not be overridden by appMasterEnv

- Replacing ++ with ++=:
@SparkQA
Copy link

SparkQA commented Jun 8, 2018

Test build #91571 has finished for PR 21468 at commit 5e733ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jun 9, 2018

Test build #91603 has finished for PR 21468 at commit 5e733ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pgandhi999
Copy link
Author

Thank you @HyukjinKwon

@tgravescs
Copy link
Contributor

sorry for the delay on this, will get to next monday,

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems correct but could be simplified a lot, which also would help with clarity.

// Finally, update the Spark config to propagate PYTHONPATH to the AM and executors.
if (pythonPath.nonEmpty) {
val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is modifying pythonPath, but it's not used again after this line, so why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, have changed it to ++

.mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
env("PYTHONPATH") = pythonPathStr
sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
val newValue =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary. What you're trying to do here is concatenate the values of the current PYTHONPATH env variable, the PYTHONPATH created by Spark, and the PYTHONPATH set in spark.yarn.appMasterEnv.

That's, respectively, sys.env.get("PYTHONPATH"), pythonPath, and env.get("PYTHONPATH"). So just concatenate those.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have done it, but I need the pythonPathStr variable to set it in executorEnv so have not modified that bit. Let me know if you think otherwise.

pythonPathStr
}
env("PYTHONPATH") = newValue
if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) {
Copy link
Contributor

@vanzin vanzin Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the previous code was overriding this in the executor env; but perhaps the right thing here is to concatenate them, otherwise the executor might be missing the py4j/pyspark stuff this class adds.

So, basically, what you want is:

  • driver: env.get(pp) ++ sys.env.get(pp) ++ pythonPath
  • executor: sparkConf.getExecutorEnv(pp) ++ sys.env.get(pp) ++ pythonPath

Copy link
Author

@pgandhi999 pgandhi999 Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is indeed an issue, fixed it. Thanks for pointing that out.

}
env("PYTHONPATH") = newValue
if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) {
sparkConf.setExecutorEnv("PYTHONPATH", newValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I did some testing of this, this was causing the job to fail, Parth is investigating. I agree with Vanzin on the executor path comment as well.

…THONPATH

Instead of overriding the executorenv PYTHONPATH, appending existing pythonpath string to it and avoiding passing value of appMasterEnv.PYTHONPATH to the executorEnv variable
@SparkQA
Copy link

SparkQA commented Jun 28, 2018

Test build #92434 has finished for PR 21468 at commit 6ba543e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
val pythonPathExecutorEnv = sparkConf.getExecutorEnv.toMap.get("PYTHONPATH").get +
ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr
sparkConf.setExecutorEnv("PYTHONPATH", pythonPathExecutorEnv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the setExecutorEnv outside of the if and have the if return the value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93128 has finished for PR 21468 at commit 5423bef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93140 has finished for PR 21468 at commit 49f37a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

+1

@tgravescs
Copy link
Contributor

merged thanks @pgandhi999

@asfgit asfgit closed this in 1272b20 Jul 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants