Skip to content

Spark UI not accessible #57

@h4gen

Description

@h4gen

Hi everybody,

as @ryanlovett asked me I opened this issue here, related to jupyterhub/zero-to-jupyterhub-k8s#1030.
The Problem is as following:

After starting PySpark I am not able to access the Spark UI, resulting in a Jupyterhub 404 error.
Here are (hopefully) all the relevant Information:

  1. I create a new user image from the from the jupyter/pyspark image
  2. The Dockerfile for this image contains:
FROM jupyter/pyspark-notebook:5b2160dfd919
RUN pip install nbserverproxy
RUN jupyter serverextension enable --py nbserverproxy
USER root
RUN echo “$NB_USER ALL=(ALL) NOPASSWD:ALL” > /etc/sudoers.d/notebook
USER $NB_USER
  1. I create the SparkContext() in the pod, created with respective image which gives me the link to the UI.
  2. The SparkContext() is created with the following config:
conf.setMaster('k8s://https://'+ os.environ['KUBERNETES_SERVICE_HOST'] +':443')
conf.set('spark.kubernetes.container.image', 'idalab/spark-py:spark')
conf.set('spark.submit.deployMode', 'client')
conf.set('spark.executor.instances', '2')
conf.setAppName('pyspark-shell')
conf.set('spark.driver.host', '10.16.205.42 ')
os.environ['PYSPARK_PYTHON'] = 'python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3'
  1. The link created by Spark is obviously not accessible on the hub as it points to <POD_IP>:4040
  2. I try to access the UI via .../username/proxy/4040 and .../username/proxy/4040/ both don't work and lead to a Jupyterhub 404.
  3. Other ports are accessible via this method so I assume nbserverextension is working correctly.
  4. This is the output of npnetstat -pl:
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:54695         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:33896         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:34577         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:52211         0.0.0.0:*               LISTEN      23/python
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      7/python
tcp        0      0 localhost:39388         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:39971         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:32867         0.0.0.0:*               LISTEN      23/python
tcp6       0      0 jupyter-hagen:43878     [::]:*                  LISTEN      45/java
tcp6       0      0 [::]:4040               [::]:*                  LISTEN      45/java
tcp6       0      0 localhost:32816         [::]:*                  LISTEN      45/java
tcp6       0      0 jupyter-hagen:41793     [::]:*                  LISTEN      45/java

One can see that the java processes have another format due to tcp6

  1. To check if this is the error I set the environment variable '_JAVA_OPTIONS' set to "-Djava.net.preferIPv4Stack=true" .

  2. This results in the following output but does not resolve the problem:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:54695         0.0.0.0:*               LISTEN      456/python
tcp        0      0 0.0.0.0:4040            0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:33896         0.0.0.0:*               LISTEN      456/python
tcp        0      0 localhost:34990         0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:36079         0.0.0.0:*               LISTEN      456/python
tcp        0      0 jupyter-hagen:35119     0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:34577         0.0.0.0:*               LISTEN      456/python
tcp        0      0 jupyter-hagen:42195     0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:34836         0.0.0.0:*               LISTEN      456/python
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      7/python
tcp        0      0 localhost:39971         0.0.0.0:*               LISTEN      456/python
tcp        0      0 localhost:32867         0.0.0.0:*               LISTEN      456/python
  1. I checked, whether the UI is generally accessible by running a local version of the user image on my PC and forwarding the port. That works fine!

My user image is available on docker hub at idalab/spark-user:1.0.2 so this should be easy to inject for debugging if neccessary.

Thank you for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions