Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Error: Only local python files are supported: gs://... #527

@paulreimer

Description

@paulreimer

I extended the docker image using the recent spark-2.2.0-k8s-0.4.0-bin-2.7.3 release to add the GCS (Google Cloud Storage) connector.

Observed:
It works great for scala jobs / jars with a gs://<bucket>/ prefix - I see it creates the init container and does populate the spark-files from what was already in GCS. However, when I try to submit a python job (or use --py-files), the spark-submit client does not allow the gs:// prefix and refuses the job.

Error: Only local python files are supported: gs://<my_bucket_name>/pi.py
Run with --help for usage help or --verbose for debug output

Expected:
The job to be allowed by spark-submit, the relevant files populated in an initcontainer, and available for the spark-driver-py and spark-executor-py to use successfully.

(FYI To add the GCS connector, I added these lines to spark-base Dockerfile:)

ENV hadoop_ver 2.7.4
# Add Hadoop 2.x native libs
ADD http://www.us.apache.org/dist/hadoop/common/hadoop-${hadoop_ver}/hadoop-${hadoop_ver}.tar.gz /opt/
RUN cd /opt/ && \
    tar xf hadoop-${hadoop_ver}.tar.gz && \
    ln -s hadoop-${hadoop_ver} hadoop

# Add the GCS connector.
ADD https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar ${SPARK_HOME}/jars/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions