-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27936][K8S] Support python deps #25870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...ration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #111078 has finished for PR 25870 at commit
|
|
@erikerlandson @holdenk pls review. I know there are some other changes (other PRs) related to the tests, I can always rebase if they get merged first. |
| .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") | ||
| .set("spark.jars.packages", "com.amazonaws:aws-java-sdk:" + | ||
| "1.7.4,org.apache.hadoop:hadoop-aws:2.7.6") | ||
| .set("spark.driver.extraJavaOptions", "-Divy.cache.dir=/tmp -Divy.home=/tmp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can refactor this part to have the properties set once as they are shared with the existing test. In general I think we should separate the Suites in the future to allow better setup for before and after conditions.
|
cc @ifilonenko who I think did some related work? |
|
@erikerlandson if you have some free time pls have a look :) |
|
@skonto this LGTM |
...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
Outdated
Show resolved
Hide resolved
...ration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala
Show resolved
Hide resolved
holdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up question re: uploading.
...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala
Outdated
Show resolved
Hide resolved
|
@holdenk this is because spark-submit adds the resource to the So since the PR here uploads whatever is needed from |
3245e4e to
4ed8524
Compare
|
@erikerlandson I added one test which uses a zip file as we discussed previously: In general there are many ways to setup the python env for a python Spark job at the driver/executor side: https://medium.com/criteo-labs/packaging-code-with-pex-a-pyspark-example-9057f9f144f3 |
Thanks @skonto! |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #112171 has finished for PR 25870 at commit
|
|
Test build #112169 has finished for PR 25870 at commit
|
holdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment about Python 2, but really excited to see this.
| val (accessKey, secretKey) = getCephCredentials() | ||
| sparkAppConf | ||
| .set("spark.kubernetes.container.image", pyImage) | ||
| .set("spark.kubernetes.pyspark.pythonVersion", "2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not use Python 2 since this is targeted for Spark 3 I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got used to 2 for years will miss it :) Sure will change.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #112275 has finished for PR 25870 at commit
|
|
@erikerlandson can I get a merge? Gentle ping :) |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status failure |
|
Retest this please |
|
Kubernetes integration test starting |
|
Test build #122747 has finished for PR 25870 at commit
|
|
Kubernetes integration test status failure |
|
Back to this. |
|
I'm also back after my motorcycle crash last year, let me know when it's up to date and I'd be happy to take another look. Thanks for sticking with this. |
|
I notice this is targeted for 3.1, is it worth trying to get on the 3.0.1 train? |
|
I think the plan is to cut 3.0.1 pretty soon since there are some almost show stopper bugs in 3.0 so I’d leave it targeted to 3.1 and once it’s in we can discuss if backporting makes sense. |
|
+1 for @holdenk 's comment. Also, in general, we cannot backport the improvement JIRA. |
|
Gentle ping, @skonto . Apache Spark 3.1.0 Feature Freeze is scheduled on Early Nov 2020 |
|
@dongjoon-hyun I will try update, sorry it has been too long, my bad. |
|
Gentle ping, @skonto . |
|
ack |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
@dongjoon-hyun I finally updated it pls review. All tests pass. |
|
Test build #131286 has finished for PR 25870 at commit
|
|
Thank you so much for updates, @skonto ! |
| } | ||
|
|
||
| def createZipFile(inFile: String, outFile: String): Unit = { | ||
| val fileToZip = new File(inFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation?
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @skonto and all.
I'll fix the indentation and merge this.
Merged to master for Apache Spark 3.1.0.
|
Just saw your comment thanks!!! |
…tainer` ### What changes were proposed in this pull request? This PR aims to simply steps to re-write primary resource in k8s spark application. ### Why are the changes needed? Re-write primary resource uses `renameMainAppResource` twice. * First `renameMainAppResource` in `baseDriverContainer` in is introduced by #23546 * #25870 refactors `renameMainAppResource` and introduces `renameMainAppResource` in `configureForJava`. Refactoring and `renameMainAppResource` in `configureForJava` makes `renameMainAppResource` in `baseDriverContainer` useless. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? * Pass the GA. * Pass k8s IT. ``` $ build/sbt -Pkubernetes -Pkubernetes-integration-tests -Dtest.exclude.tags=r -Dspark.kubernetes.test.imageRepo=kubespark "kubernetes-integration-tests/test" [info] KubernetesSuite: [info] - Run SparkPi with no resources (17 seconds, 443 milliseconds) [info] - Run SparkPi with no resources & statefulset allocation (17 seconds, 858 milliseconds) [info] - Run SparkPi with a very long application name. (30 seconds, 450 milliseconds) [info] - Use SparkLauncher.NO_RESOURCE (18 seconds, 596 milliseconds) [info] - Run SparkPi with a master URL without a scheme. (18 seconds, 534 milliseconds) [info] - Run SparkPi with an argument. (21 seconds, 853 milliseconds) [info] - Run SparkPi with custom labels, annotations, and environment variables. (14 seconds, 285 milliseconds) [info] - All pods have the same service account by default (13 seconds, 800 milliseconds) [info] - Run extraJVMOptions check on driver (7 seconds, 825 milliseconds) [info] - Run SparkRemoteFileTest using a remote data file (15 seconds, 242 milliseconds) [info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (15 seconds, 491 milliseconds) [info] - Run SparkPi with env and mount secrets. (26 seconds, 967 milliseconds) [info] - Run PySpark on simple pi.py example (20 seconds, 318 milliseconds) [info] - Run PySpark to test a pyfiles example (25 seconds, 659 milliseconds) [info] - Run PySpark with memory customization (25 seconds, 608 milliseconds) [info] - Run in client mode. (14 seconds, 620 milliseconds) [info] - Start pod creation from template (19 seconds, 916 milliseconds) [info] - SPARK-38398: Schedule pod creation from template (19 seconds, 966 milliseconds) [info] - PVs with local hostpath storage on statefulsets (22 seconds, 380 milliseconds) [info] - PVs with local hostpath and storageClass on statefulsets (26 seconds, 935 milliseconds) [info] - PVs with local storage (30 seconds, 75 milliseconds) [info] - Launcher client dependencies (2 minutes, 48 seconds) [info] - SPARK-33615: Launcher client archives (1 minute, 26 seconds) [info] - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON (1 minute, 47 seconds) [info] - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python (1 minute, 51 seconds) [info] - Launcher python client dependencies using a zip file (1 minute, 51 seconds) [info] - Test basic decommissioning (59 seconds, 765 milliseconds) [info] - Test basic decommissioning with shuffle cleanup (1 minute, 3 seconds) [info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 minutes, 58 seconds) [info] - Test decommissioning timeouts (58 seconds, 754 milliseconds) [info] - SPARK-37576: Rolling decommissioning (1 minute, 15 seconds) [info] Run completed in 29 minutes, 15 seconds. [info] Total number of tests run: 31 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 2020 s (33:40), completed 2022-4-2 12:35:52 ``` PS. #23546 introduces deleted code and `DepsTestsSuite`. `DepsTestsSuite` can check re-write primary resource. This PR can pass `DepsTestsSuite`, which can prove deletion about `renameMainAppResource` in `baseDriverContainer` does not affect the process about re-write primary resource. Closes #36044 from dcoliversun/SPARK-38770. Authored-by: Qian.Sun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Supports python client deps from the launcher fs.
Why are the changes needed?
This is a feature that was added for java deps. This PR adds support fo rpythona s well.
Does this PR introduce any user-facing change?
yes
How was this patch tested?
Manually running different scenarios and via examining the driver & executors logs. Also there is an integration test added.
I verified that the python resources are added to the spark file server and they are named properly so they dont fail the executors. Note here that as previously the following will not work:
primary resource
A.py: uses a closure defined in submited pyfileB.py, context.py only adds to the pythonpath files with certain extension eg. zip, egg, jar.