Skip to content

Conversation

@antonipp
Copy link
Contributor

What changes were proposed in this pull request?

Backport #38376 to branch-3.2

You can find a detailed description of the issue and an example reproduction on the Jira card: https://issues.apache.org/jira/browse/SPARK-40817

The idea for this fix is to update the logic which uploads user-specified files (via spark.jars, spark.files, etc) to spark.kubernetes.file.upload.path. After uploading local files, it used to overwrite the initial list of URIs passed by the user and it would thus erase all remote URIs which were specified there.

Small example of this behaviour:

  1. User set the value of spark.jars to s3a://some-bucket/my-application.jar,/tmp/some-local-jar.jar when running spark-submit in cluster mode
  2. BasicDriverFeatureStep.getAdditionalPodSystemProperties() gets called at one point while running spark-submit
  3. This function would set spark.jars to a new value of ${SPARK_KUBERNETES_UPLOAD_PATH}/spark-upload-${RANDOM_STRING}/some-local-jar.jar. Note that s3a://some-bucket/my-application.jar has been discarded.

With the logic proposed in this PR, the new value of spark.jars would be s3a://some-bucket/my-application.jar,${SPARK_KUBERNETES_UPLOAD_PATH}/spark-upload-${RANDOM_STRING}/some-local-jar.jar, so in other words we are making sure that remote URIs are no longer discarded.

Why are the changes needed?

We encountered this issue in production when trying to launch Spark on Kubernetes jobs in cluster mode with a fix of local and remote dependencies.

Does this PR introduce any user-facing change?

Yes, see description of the new behaviour above.

How was this patch tested?

  • Added a unit test for the new behaviour
  • Added an integration test for the new behaviour
  • Tried this patch in our Kubernetes environment with SparkPi:
spark-submit \
  --master k8s://https://$KUBERNETES_API_SERVER_URL:443 \
  --deploy-mode cluster \
  --name=spark-submit-test \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.jars=/opt/my-local-jar.jar,s3a://$BUCKET_NAME/my-remote-jar.jar \
  --conf spark.kubernetes.file.upload.path=s3a://$BUCKET_NAME/my-upload-path/ \
  [...]
  /opt/spark/examples/jars/spark-examples_2.12-3.1.3.jar

Before applying the patch, s3a://$BUCKET_NAME/my-remote-jar.jar was discarded from the final value of spark.jars. After applying the patch and launching the job again, I confirmed that s3a://$BUCKET_NAME/my-remote-jar.jar was no longer discarded by looking at the Spark config for the running job.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to branch-3.2.

dongjoon-hyun pushed a commit that referenced this pull request Jan 20, 2023
### What changes were proposed in this pull request?
Backport #38376 to `branch-3.2`

You can find a detailed description of the issue and an example reproduction on the Jira card: https://issues.apache.org/jira/browse/SPARK-40817

The idea for this fix is to update the logic which uploads user-specified files (via `spark.jars`, `spark.files`, etc) to `spark.kubernetes.file.upload.path`. After uploading local files, it used to overwrite the initial list of URIs passed by the user and it would thus erase all remote URIs which were specified there.

Small example of this behaviour:
1. User set the value of `spark.jars` to `s3a://some-bucket/my-application.jar,/tmp/some-local-jar.jar` when running `spark-submit` in cluster mode
2. `BasicDriverFeatureStep.getAdditionalPodSystemProperties()` gets called at one point while running `spark-submit`
3. This function would set `spark.jars` to a new value of `${SPARK_KUBERNETES_UPLOAD_PATH}/spark-upload-${RANDOM_STRING}/some-local-jar.jar`. Note that `s3a://some-bucket/my-application.jar` has been discarded.

With the logic proposed in this PR, the new value of `spark.jars` would be `s3a://some-bucket/my-application.jar,${SPARK_KUBERNETES_UPLOAD_PATH}/spark-upload-${RANDOM_STRING}/some-local-jar.jar`, so in other words we are making sure that remote URIs are no longer discarded.

### Why are the changes needed?
We encountered this issue in production when trying to launch Spark on Kubernetes jobs in cluster mode with a fix of local and remote dependencies.

### Does this PR introduce _any_ user-facing change?
Yes, see description of the new behaviour above.

### How was this patch tested?
- Added a unit test for the new behaviour
- Added an integration test for the new behaviour
- Tried this patch in our Kubernetes environment with `SparkPi`:
```
spark-submit \
  --master k8s://https://$KUBERNETES_API_SERVER_URL:443 \
  --deploy-mode cluster \
  --name=spark-submit-test \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.jars=/opt/my-local-jar.jar,s3a://$BUCKET_NAME/my-remote-jar.jar \
  --conf spark.kubernetes.file.upload.path=s3a://$BUCKET_NAME/my-upload-path/ \
  [...]
  /opt/spark/examples/jars/spark-examples_2.12-3.1.3.jar
```
Before applying the patch, `s3a://$BUCKET_NAME/my-remote-jar.jar` was discarded from the final value of `spark.jars`. After applying the patch and launching the job again, I confirmed that `s3a://$BUCKET_NAME/my-remote-jar.jar` was no longer discarded by looking at the Spark config for the running job.

Closes #39670 from antonipp/spark-40817-branch-3.2.

Authored-by: Anton Ippolitov <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
### What changes were proposed in this pull request?
Backport apache#38376 to `branch-3.2`

You can find a detailed description of the issue and an example reproduction on the Jira card: https://issues.apache.org/jira/browse/SPARK-40817

The idea for this fix is to update the logic which uploads user-specified files (via `spark.jars`, `spark.files`, etc) to `spark.kubernetes.file.upload.path`. After uploading local files, it used to overwrite the initial list of URIs passed by the user and it would thus erase all remote URIs which were specified there.

Small example of this behaviour:
1. User set the value of `spark.jars` to `s3a://some-bucket/my-application.jar,/tmp/some-local-jar.jar` when running `spark-submit` in cluster mode
2. `BasicDriverFeatureStep.getAdditionalPodSystemProperties()` gets called at one point while running `spark-submit`
3. This function would set `spark.jars` to a new value of `${SPARK_KUBERNETES_UPLOAD_PATH}/spark-upload-${RANDOM_STRING}/some-local-jar.jar`. Note that `s3a://some-bucket/my-application.jar` has been discarded.

With the logic proposed in this PR, the new value of `spark.jars` would be `s3a://some-bucket/my-application.jar,${SPARK_KUBERNETES_UPLOAD_PATH}/spark-upload-${RANDOM_STRING}/some-local-jar.jar`, so in other words we are making sure that remote URIs are no longer discarded.

### Why are the changes needed?
We encountered this issue in production when trying to launch Spark on Kubernetes jobs in cluster mode with a fix of local and remote dependencies.

### Does this PR introduce _any_ user-facing change?
Yes, see description of the new behaviour above.

### How was this patch tested?
- Added a unit test for the new behaviour
- Added an integration test for the new behaviour
- Tried this patch in our Kubernetes environment with `SparkPi`:
```
spark-submit \
  --master k8s://https://$KUBERNETES_API_SERVER_URL:443 \
  --deploy-mode cluster \
  --name=spark-submit-test \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.jars=/opt/my-local-jar.jar,s3a://$BUCKET_NAME/my-remote-jar.jar \
  --conf spark.kubernetes.file.upload.path=s3a://$BUCKET_NAME/my-upload-path/ \
  [...]
  /opt/spark/examples/jars/spark-examples_2.12-3.1.3.jar
```
Before applying the patch, `s3a://$BUCKET_NAME/my-remote-jar.jar` was discarded from the final value of `spark.jars`. After applying the patch and launching the job again, I confirmed that `s3a://$BUCKET_NAME/my-remote-jar.jar` was no longer discarded by looking at the Spark config for the running job.

Closes apache#39670 from antonipp/spark-40817-branch-3.2.

Authored-by: Anton Ippolitov <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 68fb5c4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants