You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-24384][PYTHON][SPARK SUBMIT] Add .py files correctly into PythonRunner in submit with client mode in spark-submit
## What changes were proposed in this pull request?
In client side before context initialization specifically, .py file doesn't work in client side before context initialization when the application is a Python file. See below:
```
$ cat /home/spark/tmp.py
def testtest():
return 1
```
This works:
```
$ cat app.py
import pyspark
pyspark.sql.SparkSession.builder.getOrCreate()
import tmp
print("************************%s" % tmp.testtest())
$ ./bin/spark-submit --master yarn --deploy-mode client --py-files /home/spark/tmp.py app.py
...
************************1
```
but this doesn't:
```
$ cat app.py
import pyspark
import tmp
pyspark.sql.SparkSession.builder.getOrCreate()
print("************************%s" % tmp.testtest())
$ ./bin/spark-submit --master yarn --deploy-mode client --py-files /home/spark/tmp.py app.py
Traceback (most recent call last):
File "/home/spark/spark/app.py", line 2, in <module>
import tmp
ImportError: No module named tmp
```
### How did it happen?
In client mode specifically, the paths are being added into PythonRunner as are:
https://github.com/apache/spark/blob/628c7b517969c4a7ccb26ea67ab3dd61266073ca/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L430https://github.com/apache/spark/blob/628c7b517969c4a7ccb26ea67ab3dd61266073ca/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L49-L88
The problem here is, .py file shouldn't be added as are since `PYTHONPATH` expects a directory or an archive like zip or egg.
### How does this PR fix?
We shouldn't simply just add its parent directory because other files in the parent directory could also be added into the `PYTHONPATH` in client mode before context initialization.
Therefore, we copy .py files into a temp directory for .py files and add it to `PYTHONPATH`.
## How was this patch tested?
Unit tests are added and manually tested in both standalond and yarn client modes with submit.
Author: hyukjinkwon <[email protected]>
Closesapache#21426 from HyukjinKwon/SPARK-24384.
0 commit comments