Skip to content

Commit 0368dd1

Browse files
committed
moved Python Support section
1 parent a7b5319 commit 0368dd1

File tree

1 file changed

+46
-45
lines changed

1 file changed

+46
-45
lines changed

src/jekyll/running-on-kubernetes.md

Lines changed: 46 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -115,51 +115,7 @@ Finally, notice that in the above example we specify a jar with a specific URI w
115115
the location of the example jar that is already in the Docker image. Using dependencies that are on your machine's local
116116
disk is discussed below.
117117

118-
## Dependency Management
119-
120-
Application dependencies that are being submitted from your machine need to be sent to a **resource staging server**
121-
that the driver and executor can then communicate with to retrieve those dependencies. A YAML file denoting a minimal
122-
set of Kubernetes resources that runs this service is located in the file `conf/kubernetes-resource-staging-server.yaml`.
123-
This YAML file configures a Deployment with one pod running the resource staging server configured with a ConfigMap,
124-
and exposes the server through a Service with a fixed NodePort. Deploying a resource staging server with the included
125-
YAML file requires you to have permissions to create Deployments, Services, and ConfigMaps.
126-
127-
To run the resource staging server with default configurations, the Kubernetes resources can be created:
128-
129-
kubectl create -f conf/kubernetes-resource-staging-server.yaml
130-
131-
and then you can compute the value of Pi as follows:
132-
133-
bin/spark-submit \
134-
--deploy-mode cluster \
135-
--class org.apache.spark.examples.SparkPi \
136-
--master k8s://<k8s-apiserver-host>:<k8s-apiserver-port> \
137-
--kubernetes-namespace default \
138-
--conf spark.executor.instances=5 \
139-
--conf spark.app.name=spark-pi \
140-
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.3.0 \
141-
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.3.0 \
142-
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.0 \
143-
--conf spark.kubernetes.resourceStagingServer.uri=http://<address-of-any-cluster-node>:31000 \
144-
examples/jars/spark_examples_2.11-2.2.0.jar
145-
146-
The Docker image for the resource staging server may also be built from source, in a similar manner to the driver
147-
and executor images. The Dockerfile is provided in `dockerfiles/resource-staging-server/Dockerfile`.
148-
149-
The provided YAML file specifically sets the NodePort to 31000 on the service's specification. If port 31000 is not
150-
available on any of the nodes of your cluster, you should remove the NodePort field from the service's specification
151-
and allow the Kubernetes cluster to determine the NodePort itself. Be sure to provide the correct port in the resource
152-
staging server URI when submitting your application, in accordance to the NodePort chosen by the Kubernetes cluster.
153-
154-
### Dependency Management Without The Resource Staging Server
155-
156-
Note that this resource staging server is only required for submitting local dependencies. If your application's
157-
dependencies are all hosted in remote locations like HDFS or http servers, they may be referred to by their appropriate
158-
remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images. Those dependencies
159-
can be added to the classpath by referencing them with `local://` URIs and/or setting the `SPARK_EXTRA_CLASSPATH`
160-
environment variable in your Dockerfiles.
161-
162-
### Python Support
118+
## Python Support
163119

164120
With the ever growing support for Python by data scientists, we have supported the submission of PySpark applications.
165121
These applications follow the general syntax that you would expect from other cluster managers. The submission of a PySpark
@@ -226,6 +182,51 @@ command with your appropriate file (i.e. MY_SPARK_FILE)
226182
-Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY \
227183
$SPARK_DRIVER_CLASS $PYSPARK_PRIMARY MY_PYSPARK_FILE,$PYSPARK_FILES $SPARK_DRIVER_ARGS
228184
185+
186+
## Dependency Management
187+
188+
Application dependencies that are being submitted from your machine need to be sent to a **resource staging server**
189+
that the driver and executor can then communicate with to retrieve those dependencies. A YAML file denoting a minimal
190+
set of Kubernetes resources that runs this service is located in the file `conf/kubernetes-resource-staging-server.yaml`.
191+
This YAML file configures a Deployment with one pod running the resource staging server configured with a ConfigMap,
192+
and exposes the server through a Service with a fixed NodePort. Deploying a resource staging server with the included
193+
YAML file requires you to have permissions to create Deployments, Services, and ConfigMaps.
194+
195+
To run the resource staging server with default configurations, the Kubernetes resources can be created:
196+
197+
kubectl create -f conf/kubernetes-resource-staging-server.yaml
198+
199+
and then you can compute the value of Pi as follows:
200+
201+
bin/spark-submit \
202+
--deploy-mode cluster \
203+
--class org.apache.spark.examples.SparkPi \
204+
--master k8s://<k8s-apiserver-host>:<k8s-apiserver-port> \
205+
--kubernetes-namespace default \
206+
--conf spark.executor.instances=5 \
207+
--conf spark.app.name=spark-pi \
208+
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.3.0 \
209+
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.3.0 \
210+
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.3.0 \
211+
--conf spark.kubernetes.resourceStagingServer.uri=http://<address-of-any-cluster-node>:31000 \
212+
examples/jars/spark_examples_2.11-2.2.0.jar
213+
214+
The Docker image for the resource staging server may also be built from source, in a similar manner to the driver
215+
and executor images. The Dockerfile is provided in `dockerfiles/resource-staging-server/Dockerfile`.
216+
217+
The provided YAML file specifically sets the NodePort to 31000 on the service's specification. If port 31000 is not
218+
available on any of the nodes of your cluster, you should remove the NodePort field from the service's specification
219+
and allow the Kubernetes cluster to determine the NodePort itself. Be sure to provide the correct port in the resource
220+
staging server URI when submitting your application, in accordance to the NodePort chosen by the Kubernetes cluster.
221+
222+
### Dependency Management Without The Resource Staging Server
223+
224+
Note that this resource staging server is only required for submitting local dependencies. If your application's
225+
dependencies are all hosted in remote locations like HDFS or http servers, they may be referred to by their appropriate
226+
remote URIs. Also, application dependencies can be pre-mounted into custom-built Docker images. Those dependencies
227+
can be added to the classpath by referencing them with `local://` URIs and/or setting the `SPARK_EXTRA_CLASSPATH`
228+
environment variable in your Dockerfiles.
229+
229230
### Accessing Kubernetes Clusters
230231

231232
Spark-submit also supports submission through the

0 commit comments

Comments
 (0)