Skip to content

Conversation

@skonto
Copy link
Contributor

@skonto skonto commented Sep 20, 2019

What changes were proposed in this pull request?

Supports python client deps from the launcher fs.

Why are the changes needed?

This is a feature that was added for java deps. This PR adds support fo rpythona s well.

Does this PR introduce any user-facing change?

yes

How was this patch tested?

Manually running different scenarios and via examining the driver & executors logs. Also there is an integration test added.
I verified that the python resources are added to the spark file server and they are named properly so they dont fail the executors. Note here that as previously the following will not work:
primary resource A.py: uses a closure defined in submited pyfile B.py, context.py only adds to the pythonpath files with certain extension eg. zip, egg, jar.

@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/16163/

@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/16163/

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27936][Kubernetes] support python deps [SPARK-27936][K8S] support python deps Sep 20, 2019
@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Test build #111078 has finished for PR 25870 at commit 40fe23c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Sep 20, 2019

@erikerlandson @holdenk pls review. I know there are some other changes (other PRs) related to the tests, I can always rebase if they get merged first.

.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
.set("spark.jars.packages", "com.amazonaws:aws-java-sdk:" +
"1.7.4,org.apache.hadoop:hadoop-aws:2.7.6")
.set("spark.driver.extraJavaOptions", "-Divy.cache.dir=/tmp -Divy.home=/tmp")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can refactor this part to have the properties set once as they are shared with the existing test. In general I think we should separate the Suites in the future to allow better setup for before and after conditions.

@holdenk
Copy link
Contributor

holdenk commented Sep 21, 2019

cc @ifilonenko who I think did some related work?

@skonto
Copy link
Contributor Author

skonto commented Sep 30, 2019

@erikerlandson if you have some free time pls have a look :)

@erikerlandson
Copy link
Contributor

@skonto this LGTM

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up question re: uploading.

@skonto
Copy link
Contributor Author

skonto commented Oct 11, 2019

@holdenk this is because spark-submit adds the resource to the sparks.jars property by default,
check bellow:

19/10/11 13:01:30 WARN Utils: Your hostname, universe resolves to a loopback address: 127.0.1.1; using 192.168.2.4 instead (on interface wlp2s0)
19/10/11 13:01:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Parsed arguments:
  master                  k8s://https://10.96.0.1:443
  deployMode              cluster
  executorMemory          1G
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            1G
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            2
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.SparkPi
  primaryResource         local:///opt/spark/examples/jars/spark-examples_2.12-2.1.2-2.4.4-lightbend.jar
  name                    spark-pi
  childArgs               [100]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:
  (spark.kubernetes.driver.pod.name,spark-pi-driver)
  (spark.executor.instances,2)
  (spark.driver.memory,1G)
  (spark.executor.memory,1G)
  (spark.kubernetes.authenticate.driver.serviceAccountName,spark-sa)
  (spark.kubernetes.namespace,spark)
  (spark.kubernetes.container.image,lightbend/spark:2.1.2-OpenShift-2.4.4-rh-2.12)
  (spark.kubernetes.container.image.pullPolicy,Always)

    
19/10/11 13:01:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Main class:
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication
Arguments:
--primary-java-resource
local:///opt/spark/examples/jars/spark-examples_2.12-2.1.2-2.4.4-lightbend.jar
--main-class
org.apache.spark.examples.SparkPi
--arg
100
Spark config:
(spark.kubernetes.namespace,spark)
(spark.jars,local:///opt/spark/examples/jars/spark-examples_2.12-2.1.2-2.4.4-lightbend.jar)
(spark.app.name,spark-pi)
(spark.driver.memory,1G)
(spark.executor.instances,2)
(spark.submit.pyFiles,)
(spark.kubernetes.container.image.pullPolicy,Always)
(spark.kubernetes.container.image,lightbend/spark:2.1.2-OpenShift-2.4.4-rh-2.12)
(spark.submit.deployMode,cluster)
(spark.master,k8s://https://10.96.0.1:443)
(spark.kubernetes.authenticate.driver.serviceAccountName,spark-sa)
(spark.executor.memory,1G)
(spark.kubernetes.driver.pod.name,spark-pi-driver)
Classpath elements:

So since the PR here uploads whatever is needed from spark.jars we dont need to do double work. For python its different as technically the main resource is not a jar.

@skonto skonto force-pushed the python-deps branch 2 times, most recently from 3245e4e to 4ed8524 Compare October 16, 2019 13:23
@skonto
Copy link
Contributor Author

skonto commented Oct 16, 2019

@erikerlandson I added one test which uses a zip file as we discussed previously:

ls tests/
py_container_checks.py   py_container_checks.zip  pyfiles.py               worker_memory_check.py

Run starting. Expected test count is: 3
KubernetesSuite:
- Launcher java client dependencies
- Launcher python client dependencies using py
- Launcher python client dependencies using a zip file

In general there are many ways to setup the python env for a python Spark job at the driver/executor side: https://medium.com/criteo-labs/packaging-code-with-pex-a-pyspark-example-9057f9f144f3
but at least we cover the basic ones.

@erikerlandson
Copy link
Contributor

@erikerlandson I added one test which uses a zip file as we discussed previously:

Thanks @skonto!

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17161/

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17161/

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17163/

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17163/

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112171 has finished for PR 25870 at commit 4ed8524.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112169 has finished for PR 25870 at commit 3245e4e.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment about Python 2, but really excited to see this.

val (accessKey, secretKey) = getCephCredentials()
sparkAppConf
.set("spark.kubernetes.container.image", pyImage)
.set("spark.kubernetes.pyspark.pythonVersion", "2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not use Python 2 since this is targeted for Spark 3 I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got used to 2 for years will miss it :) Sure will change.

@SparkQA
Copy link

SparkQA commented Oct 18, 2019

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17258/

@SparkQA
Copy link

SparkQA commented Oct 18, 2019

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17258/

@SparkQA
Copy link

SparkQA commented Oct 18, 2019

Test build #112275 has finished for PR 25870 at commit f8df227.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Oct 21, 2019

@erikerlandson can I get a merge? Gentle ping :)

@SparkQA
Copy link

SparkQA commented May 15, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27309/

@SparkQA
Copy link

SparkQA commented May 15, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27312/

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented May 17, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27406/

@SparkQA
Copy link

SparkQA commented May 17, 2020

Test build #122747 has finished for PR 25870 at commit a9f055b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 17, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27406/

@skonto
Copy link
Contributor Author

skonto commented Jul 16, 2020

Back to this.

@holdenk
Copy link
Contributor

holdenk commented Jul 17, 2020

I'm also back after my motorcycle crash last year, let me know when it's up to date and I'd be happy to take another look. Thanks for sticking with this.

@erikerlandson
Copy link
Contributor

I notice this is targeted for 3.1, is it worth trying to get on the 3.0.1 train?

@holdenk
Copy link
Contributor

holdenk commented Jul 19, 2020

I think the plan is to cut 3.0.1 pretty soon since there are some almost show stopper bugs in 3.0 so I’d leave it targeted to 3.1 and once it’s in we can discuss if backporting makes sense.

@dongjoon-hyun
Copy link
Member

+1 for @holdenk 's comment. Also, in general, we cannot backport the improvement JIRA.

@dongjoon-hyun
Copy link
Member

Gentle ping, @skonto . Apache Spark 3.1.0 Feature Freeze is scheduled on Early Nov 2020

@skonto
Copy link
Contributor Author

skonto commented Oct 14, 2020

@dongjoon-hyun I will try update, sorry it has been too long, my bad.

@dongjoon-hyun
Copy link
Member

Gentle ping, @skonto .

@skonto
Copy link
Contributor Author

skonto commented Nov 18, 2020

ack

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35889/

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35889/

@skonto
Copy link
Contributor Author

skonto commented Nov 18, 2020

@dongjoon-hyun I finally updated it pls review. All tests pass.

@SparkQA
Copy link

SparkQA commented Nov 18, 2020

Test build #131286 has finished for PR 25870 at commit 0145183.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you so much for updates, @skonto !

}

def createZipFile(inFile: String, outFile: String): Unit = {
val fileToZip = new File(inFile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @skonto and all.
I'll fix the indentation and merge this.

Merged to master for Apache Spark 3.1.0.

@skonto
Copy link
Contributor Author

skonto commented Nov 18, 2020

Just saw your comment thanks!!!

dongjoon-hyun pushed a commit that referenced this pull request Apr 3, 2022
…tainer`

### What changes were proposed in this pull request?

This PR aims to simply steps to re-write primary resource in k8s spark application.

### Why are the changes needed?

Re-write primary resource uses `renameMainAppResource` twice.
* First `renameMainAppResource` in `baseDriverContainer` in  is introduced by #23546
* #25870 refactors `renameMainAppResource` and introduces `renameMainAppResource` in `configureForJava`.

Refactoring and `renameMainAppResource` in `configureForJava` makes `renameMainAppResource` in `baseDriverContainer` useless.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

* Pass the GA.
* Pass k8s IT.
```
$ build/sbt -Pkubernetes -Pkubernetes-integration-tests -Dtest.exclude.tags=r -Dspark.kubernetes.test.imageRepo=kubespark "kubernetes-integration-tests/test"
[info] KubernetesSuite:
[info] - Run SparkPi with no resources (17 seconds, 443 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (17 seconds, 858 milliseconds)
[info] - Run SparkPi with a very long application name. (30 seconds, 450 milliseconds)
[info] - Use SparkLauncher.NO_RESOURCE (18 seconds, 596 milliseconds)
[info] - Run SparkPi with a master URL without a scheme. (18 seconds, 534 milliseconds)
[info] - Run SparkPi with an argument. (21 seconds, 853 milliseconds)
[info] - Run SparkPi with custom labels, annotations, and environment variables. (14 seconds, 285 milliseconds)
[info] - All pods have the same service account by default (13 seconds, 800 milliseconds)
[info] - Run extraJVMOptions check on driver (7 seconds, 825 milliseconds)
[info] - Run SparkRemoteFileTest using a remote data file (15 seconds, 242 milliseconds)
[info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (15 seconds, 491 milliseconds)
[info] - Run SparkPi with env and mount secrets. (26 seconds, 967 milliseconds)
[info] - Run PySpark on simple pi.py example (20 seconds, 318 milliseconds)
[info] - Run PySpark to test a pyfiles example (25 seconds, 659 milliseconds)
[info] - Run PySpark with memory customization (25 seconds, 608 milliseconds)
[info] - Run in client mode. (14 seconds, 620 milliseconds)
[info] - Start pod creation from template (19 seconds, 916 milliseconds)
[info] - SPARK-38398: Schedule pod creation from template (19 seconds, 966 milliseconds)
[info] - PVs with local hostpath storage on statefulsets (22 seconds, 380 milliseconds)
[info] - PVs with local hostpath and storageClass on statefulsets (26 seconds, 935 milliseconds)
[info] - PVs with local storage (30 seconds, 75 milliseconds)
[info] - Launcher client dependencies (2 minutes, 48 seconds)
[info] - SPARK-33615: Launcher client archives (1 minute, 26 seconds)
[info] - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON (1 minute, 47 seconds)
[info] - SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python (1 minute, 51 seconds)
[info] - Launcher python client dependencies using a zip file (1 minute, 51 seconds)
[info] - Test basic decommissioning (59 seconds, 765 milliseconds)
[info] - Test basic decommissioning with shuffle cleanup (1 minute, 3 seconds)
[info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 minutes, 58 seconds)
[info] - Test decommissioning timeouts (58 seconds, 754 milliseconds)
[info] - SPARK-37576: Rolling decommissioning (1 minute, 15 seconds)
[info] Run completed in 29 minutes, 15 seconds.
[info] Total number of tests run: 31
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 2020 s (33:40), completed 2022-4-2 12:35:52
```
PS. #23546 introduces deleted code and `DepsTestsSuite`. `DepsTestsSuite` can check re-write primary resource. This PR can pass `DepsTestsSuite`, which can prove deletion about `renameMainAppResource` in `baseDriverContainer` does not affect the process about re-write primary resource.

Closes #36044 from dcoliversun/SPARK-38770.

Authored-by: Qian.Sun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants