[SPARK-27936][K8S] Support python deps #25870

skonto · 2019-09-20T15:33:03Z

What changes were proposed in this pull request?

Supports python client deps from the launcher fs.

Why are the changes needed?

This is a feature that was added for java deps. This PR adds support fo rpythona s well.

Does this PR introduce any user-facing change?

yes

How was this patch tested?

Manually running different scenarios and via examining the driver & executors logs. Also there is an integration test added.
I verified that the python resources are added to the spark file server and they are named properly so they dont fail the executors. Note here that as previously the following will not work:
primary resource A.py: uses a closure defined in submited pyfile B.py, context.py only adds to the pythonpath files with certain extension eg. zip, egg, jar.

...ration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala

SparkQA · 2019-09-20T16:12:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/16163/

SparkQA · 2019-09-20T16:29:37Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/16163/

SparkQA · 2019-09-20T18:27:52Z

Test build #111078 has finished for PR 25870 at commit 40fe23c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

skonto · 2019-09-20T22:02:57Z

@erikerlandson @holdenk pls review. I know there are some other changes (other PRs) related to the tests, I can always rebase if they get merged first.

skonto · 2019-09-20T22:31:10Z

...ration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala

+        .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
+        .set("spark.jars.packages", "com.amazonaws:aws-java-sdk:" +
+          "1.7.4,org.apache.hadoop:hadoop-aws:2.7.6")
+        .set("spark.driver.extraJavaOptions", "-Divy.cache.dir=/tmp -Divy.home=/tmp")


I can refactor this part to have the properties set once as they are shared with the existing test. In general I think we should separate the Suites in the future to allow better setup for before and after conditions.

holdenk · 2019-09-21T01:23:01Z

cc @ifilonenko who I think did some related work?

skonto · 2019-09-30T12:44:55Z

@erikerlandson if you have some free time pls have a look :)

erikerlandson · 2019-09-30T15:27:27Z

@skonto this LGTM

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala

...ration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala

holdenk

Follow up question re: uploading.

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala

skonto · 2019-10-11T10:02:30Z

@holdenk this is because spark-submit adds the resource to the sparks.jars property by default,
check bellow:

19/10/11 13:01:30 WARN Utils: Your hostname, universe resolves to a loopback address: 127.0.1.1; using 192.168.2.4 instead (on interface wlp2s0)
19/10/11 13:01:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Parsed arguments:
  master                  k8s://https://10.96.0.1:443
  deployMode              cluster
  executorMemory          1G
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            1G
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            2
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.SparkPi
  primaryResource         local:///opt/spark/examples/jars/spark-examples_2.12-2.1.2-2.4.4-lightbend.jar
  name                    spark-pi
  childArgs               [100]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:
  (spark.kubernetes.driver.pod.name,spark-pi-driver)
  (spark.executor.instances,2)
  (spark.driver.memory,1G)
  (spark.executor.memory,1G)
  (spark.kubernetes.authenticate.driver.serviceAccountName,spark-sa)
  (spark.kubernetes.namespace,spark)
  (spark.kubernetes.container.image,lightbend/spark:2.1.2-OpenShift-2.4.4-rh-2.12)
  (spark.kubernetes.container.image.pullPolicy,Always)

    
19/10/11 13:01:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Main class:
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication
Arguments:
--primary-java-resource
local:///opt/spark/examples/jars/spark-examples_2.12-2.1.2-2.4.4-lightbend.jar
--main-class
org.apache.spark.examples.SparkPi
--arg
100
Spark config:
(spark.kubernetes.namespace,spark)
(spark.jars,local:///opt/spark/examples/jars/spark-examples_2.12-2.1.2-2.4.4-lightbend.jar)
(spark.app.name,spark-pi)
(spark.driver.memory,1G)
(spark.executor.instances,2)
(spark.submit.pyFiles,)
(spark.kubernetes.container.image.pullPolicy,Always)
(spark.kubernetes.container.image,lightbend/spark:2.1.2-OpenShift-2.4.4-rh-2.12)
(spark.submit.deployMode,cluster)
(spark.master,k8s://https://10.96.0.1:443)
(spark.kubernetes.authenticate.driver.serviceAccountName,spark-sa)
(spark.executor.memory,1G)
(spark.kubernetes.driver.pod.name,spark-pi-driver)
Classpath elements:

So since the PR here uploads whatever is needed from spark.jars we dont need to do double work. For python its different as technically the main resource is not a jar.

skonto · 2019-10-16T13:25:14Z

@erikerlandson I added one test which uses a zip file as we discussed previously:

ls tests/
py_container_checks.py   py_container_checks.zip  pyfiles.py               worker_memory_check.py

Run starting. Expected test count is: 3
KubernetesSuite:
- Launcher java client dependencies
- Launcher python client dependencies using py
- Launcher python client dependencies using a zip file

In general there are many ways to setup the python env for a python Spark job at the driver/executor side: https://medium.com/criteo-labs/packaging-code-with-pex-a-pyspark-example-9057f9f144f3
but at least we cover the basic ones.

erikerlandson · 2019-10-16T13:31:32Z

@erikerlandson I added one test which uses a zip file as we discussed previously:

Thanks @skonto!

SparkQA · 2019-10-16T13:58:36Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17161/

SparkQA · 2019-10-16T14:09:27Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17161/

SparkQA · 2019-10-16T14:10:08Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17163/

SparkQA · 2019-10-16T14:32:07Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17163/

SparkQA · 2019-10-16T16:03:49Z

Test build #112171 has finished for PR 25870 at commit 4ed8524.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-16T16:53:22Z

Test build #112169 has finished for PR 25870 at commit 3245e4e.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

holdenk

Minor comment about Python 2, but really excited to see this.

holdenk · 2019-10-17T06:05:46Z

...ration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala

+      val (accessKey, secretKey) = getCephCredentials()
+      sparkAppConf
+        .set("spark.kubernetes.container.image", pyImage)
+        .set("spark.kubernetes.pyspark.pythonVersion", "2")


Lets not use Python 2 since this is targeted for Spark 3 I believe.

I got used to 2 for years will miss it :) Sure will change.

SparkQA · 2019-10-18T13:12:15Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17258/

SparkQA · 2019-10-18T13:43:30Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17258/

SparkQA · 2019-10-18T14:49:44Z

Test build #112275 has finished for PR 25870 at commit f8df227.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

skonto · 2019-10-21T19:25:48Z

@erikerlandson can I get a merge? Gentle ping :)

SparkQA · 2020-05-15T09:46:08Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27309/

SparkQA · 2020-05-15T10:00:35Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27312/

dongjoon-hyun · 2020-05-16T22:43:23Z

Retest this please

SparkQA · 2020-05-17T02:48:07Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27406/

SparkQA · 2020-05-17T03:20:41Z

Test build #122747 has finished for PR 25870 at commit a9f055b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-17T03:28:02Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27406/

skonto · 2020-07-16T19:47:28Z

Back to this.

holdenk · 2020-07-17T22:39:16Z

I'm also back after my motorcycle crash last year, let me know when it's up to date and I'd be happy to take another look. Thanks for sticking with this.

erikerlandson · 2020-07-19T20:18:05Z

I notice this is targeted for 3.1, is it worth trying to get on the 3.0.1 train?

holdenk · 2020-07-19T21:43:24Z

I think the plan is to cut 3.0.1 pretty soon since there are some almost show stopper bugs in 3.0 so I’d leave it targeted to 3.1 and once it’s in we can discuss if backporting makes sense.

dongjoon-hyun · 2020-07-20T22:09:42Z

+1 for @holdenk 's comment. Also, in general, we cannot backport the improvement JIRA.

dongjoon-hyun · 2020-10-01T02:17:39Z

Gentle ping, @skonto . Apache Spark 3.1.0 Feature Freeze is scheduled on Early Nov 2020

https://spark.apache.org/versioning-policy.html

skonto · 2020-10-14T07:46:35Z

@dongjoon-hyun I will try update, sorry it has been too long, my bad.

dongjoon-hyun · 2020-11-18T07:29:11Z

Gentle ping, @skonto .

skonto · 2020-11-18T08:44:37Z

ack

SparkQA · 2020-11-18T14:37:45Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35889/

SparkQA · 2020-11-18T15:03:24Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35889/

skonto · 2020-11-18T15:18:40Z

@dongjoon-hyun I finally updated it pls review. All tests pass.

SparkQA · 2020-11-18T16:45:04Z

Test build #131286 has finished for PR 25870 at commit 0145183.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-11-18T17:52:24Z

Thank you so much for updates, @skonto !

dongjoon-hyun · 2020-11-18T17:56:11Z

...tes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/Utils.scala

  }
+
+  def createZipFile(inFile: String, outFile: String): Unit = {
+      val fileToZip = new File(inFile)


indentation?

dongjoon-hyun

+1, LGTM. Thank you, @skonto and all.
I'll fix the indentation and merge this.

Merged to master for Apache Spark 3.1.0.

skonto · 2020-11-18T20:33:40Z

Just saw your comment thanks!!!

…tainer` ### What changes were proposed in this pull request? This PR aims to simply steps to re-write primary resource in k8s spark application. ### Why are the changes needed? Re-write primary resource uses `renameMainAppResource` twice. * First `renameMainAppResource` in `baseDriverContainer` in is introduced by #23546 * #25870 refactors `renameMainAppResource` and introduces `renameMainAppResource` in `configureForJava`. Refactoring and `renameMainAppResource` in `configureForJava` makes `renameMainAppResource` in `baseDriverContainer` useless. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? * Pass the GA. * Pass k8s IT. ``` $ build/sbt -Pkubernetes -Pkubernetes-integration-tests -Dtest.exclude.tags=r -Dspark.kubernetes.test.imageRepo=kubespark "kubernetes-integration-tests/test" [info] KubernetesSuite: [info] - Run SparkPi with no resources (17 seconds, 443 milliseconds) [info] - Run SparkPi with no resources & statefulset allocation (17 seconds, 858 milliseconds) [info] - Run SparkPi with a very long application name. (30 seconds, 450 milliseconds) [info] - Use SparkLauncher.NO_RESOURCE (18 seconds, 596 milliseconds) [info] - Run SparkPi with a master URL without a scheme. (18 seconds, 534 milliseconds) [info] - Run SparkPi with an argument. (21 seconds, 853 milliseconds) [info] - Run SparkPi with custom labels, annotations, and environment variables. (14 seconds, 285 milliseconds) [info] - All pods have the same service account by default (13 seconds, 800 milliseconds) [info] - Run extraJVMOptions check on driver (7 seconds, 825 milliseconds) [info] - Run SparkRemoteFileTest using a remote data file (15 seconds, 242 milliseconds) [info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (15 seconds, 491 milliseconds) [info] - Run SparkPi with env and mount secrets. (26 seconds, 967 milliseconds) [info] - Run PySpark on simple pi.py example (20 seconds, 318 milliseconds) [info] - Run PySpark to test a pyfiles example (25 seconds, 659 milliseconds) [info] - Run PySpark with memory customization (25 seconds, 608 milliseconds) [info] - Run in client mode. (14 seconds, 620 milliseconds) [info] - Start pod creation from template (19 seconds, 916 milliseconds) [info] - SPARK-38398: Schedule pod creation from template (19 seconds, 966 milliseconds) [info] - PVs with local hostpath storage on statefulsets (22 seconds, 380 milliseconds) [info] - PVs with local hostpath and storageClass on statefulsets (26 seconds, 935 milliseconds) [info] - PVs with local storage (30 seconds, 75 milliseconds) [info] - Launcher client dependencies (2 minutes, 48 seconds) [info] - SPARK-33615: Launcher client archives (1 minute, 26 seconds) [info] - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON (1 minute, 47 seconds) [info] - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python (1 minute, 51 seconds) [info] - Launcher python client dependencies using a zip file (1 minute, 51 seconds) [info] - Test basic decommissioning (59 seconds, 765 milliseconds) [info] - Test basic decommissioning with shuffle cleanup (1 minute, 3 seconds) [info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 minutes, 58 seconds) [info] - Test decommissioning timeouts (58 seconds, 754 milliseconds) [info] - SPARK-37576: Rolling decommissioning (1 minute, 15 seconds) [info] Run completed in 29 minutes, 15 seconds. [info] Total number of tests run: 31 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 2020 s (33:40), completed 2022-4-2 12:35:52 ``` PS. #23546 introduces deleted code and `DepsTestsSuite`. `DepsTestsSuite` can check re-write primary resource. This PR can pass `DepsTestsSuite`, which can prove deletion about `renameMainAppResource` in `baseDriverContainer` does not affect the process about re-write primary resource. Closes #36044 from dcoliversun/SPARK-38770. Authored-by: Qian.Sun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

skonto commented Sep 20, 2019

View reviewed changes

...ration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala Show resolved Hide resolved

dongjoon-hyun added the KUBERNETES label Sep 20, 2019

dongjoon-hyun changed the title ~~[SPARK-27936][Kubernetes] support python deps~~ [SPARK-27936][K8S] support python deps Sep 20, 2019

skonto commented Sep 20, 2019

View reviewed changes

holdenk reviewed Oct 1, 2019

View reviewed changes

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala Outdated Show resolved Hide resolved

...ration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala Show resolved Hide resolved

holdenk reviewed Oct 4, 2019

View reviewed changes

...ce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala Outdated Show resolved Hide resolved

skonto force-pushed the python-deps branch 2 times, most recently from 3245e4e to 4ed8524 Compare October 16, 2019 13:23

holdenk reviewed Oct 17, 2019

View reviewed changes

skonto force-pushed the python-deps branch from 4ed8524 to f8df227 Compare October 18, 2019 12:32

skonto and others added 2 commits November 18, 2020 14:20

support python deps

c183010

update dep it

0145183

skonto force-pushed the python-deps branch from a9f055b to 0145183 Compare November 18, 2020 13:53

dongjoon-hyun reviewed Nov 18, 2020

View reviewed changes

dongjoon-hyun approved these changes Nov 18, 2020

View reviewed changes

dongjoon-hyun closed this in dcac78e Nov 18, 2020

xiaomao23zhi mentioned this pull request Jan 21, 2022

Do 187/spark 3.1.1 JahstreetOrg/spark-on-kubernetes-docker#17

Closed

dcoliversun mentioned this pull request Apr 2, 2022

[SPARK-38770][K8S] Remove renameMainAppResource from baseDriverContainer #36044

Closed

[SPARK-27936][K8S] Support python deps #25870

[SPARK-27936][K8S] Support python deps #25870

Uh oh!

Conversation

skonto commented Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

SparkQA commented Sep 20, 2019

Uh oh!

SparkQA commented Sep 20, 2019

Uh oh!

SparkQA commented Sep 20, 2019

Uh oh!

skonto commented Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skonto Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

holdenk commented Sep 21, 2019

Uh oh!

skonto commented Sep 30, 2019

Uh oh!

erikerlandson commented Sep 30, 2019

Uh oh!

Uh oh!

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skonto commented Oct 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skonto commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikerlandson commented Oct 16, 2019

Uh oh!

SparkQA commented Oct 16, 2019

Uh oh!

SparkQA commented Oct 16, 2019

Uh oh!

SparkQA commented Oct 16, 2019

Uh oh!

SparkQA commented Oct 16, 2019

Uh oh!

SparkQA commented Oct 16, 2019

Uh oh!

SparkQA commented Oct 16, 2019

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

holdenk Oct 17, 2019

Choose a reason for hiding this comment

Uh oh!

skonto Oct 18, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 18, 2019

Uh oh!

SparkQA commented Oct 18, 2019

Uh oh!

SparkQA commented Oct 18, 2019

Uh oh!

skonto commented Oct 21, 2019

Uh oh!

SparkQA commented May 15, 2020

Uh oh!

SparkQA commented May 15, 2020

Uh oh!

dongjoon-hyun commented May 16, 2020

Uh oh!

SparkQA commented May 17, 2020

skonto commented Sep 20, 2019 •

edited

Loading

skonto commented Sep 20, 2019 •

edited

Loading

skonto commented Oct 11, 2019 •

edited

Loading

skonto commented Oct 16, 2019 •

edited

Loading

skonto commented Nov 18, 2020 •

edited

Loading