From be312e074c02d95e43f6ff3b420e722d6ebc3f3b Mon Sep 17 00:00:00 2001
From: mcheah <mcheah@palantir.com>
Date: Thu, 12 Jan 2017 17:29:02 -0800
Subject: [PATCH 1/5] Documentation for the current state of the world.

---
 docs/running-on-kubernetes.md | 206 ++++++++++++++++++++++++++++++++++
 1 file changed, 206 insertions(+)
 create mode 100644 docs/running-on-kubernetes.md

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
new file mode 100644
index 0000000000000..0478360d7a91e
--- /dev/null
+++ b/docs/running-on-kubernetes.md
@@ -0,0 +1,206 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+
+Support for running on [Kubernetes](https://kubernetes.io/) is available in experimental status. The feature set is
+currently limited and not well-tested.
+
+## Setting Up Docker Images
+
+In order to run Spark on Kubernetes, a Docker image must be built and available on the Docker registry. Spark
+distributions include the Docker files for the driver and the executor at `dockerfiles/driver/Dockerfile` and
+`docker/executor/Dockerfile`, respectively. Use these Docker files to build the Docker images, and then tag them with
+the registry that the images should be sent to. Finally, push the images to the registry.
+
+For example, if the registry host is `registry-host` and the registry is listening on port 5000:
+
+    cd $SPARK_HOME
+    docker build -t registry-host:5000/spark-driver:latest -f dockerfiles/driver/Dockerfile .
+    docker build -t registry-host:5000/spark-executor:latest -f dockerfiles/executor/Dockerfile .
+    docker push registry-host:5000/spark-driver:latest
+    docker push registry-host:5000/spark-executor:latest
+    
+## Submitting Applications to Kubernetes
+
+Kubernetes applications can be executed via `spark-submit`. For example, to compute the value of pi, assuming the
+docker images were set up as described above:
+
+    bin/spark-submit 
+      --deploy-mode cluster 
+      --class org.apache.spark.examples.SparkPi 
+      --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>
+      --kubernetes-namespace default
+      --conf spark.executor.instances=5 
+      --conf spark.app.name=spark-pi
+      --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest
+      --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
+      examples/jars/spark_2.11-2.2.0.jar
+
+The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the format `k8s://<api_server_url`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on a Kubernetes cluster, where the API server is
+contacted  at the appropriate inner URL. The HTTP protocol must also be specified.
+
+Note that applications can currently only be executed in cluster mode.
+ 
+### Adding Other JARs
+ 
+Spark allows users to provide dependencies that live on the driver's docker image, or that are on the local disk of the
+submitter's machine. These dependencies are also sent to the executors. These two types of dependencies are specified
+via different configuration options to `spark-submit`.
+ 
+* Local jars provided by specifying the `--jars` command line argument to `spark-submit`, or by setting `spark.jars` in
+  the application's configuration, will be treated as jars that are located on the *disk of the driver docker
+  container*. This only applies to jar paths that do not specify a scheme or that have the scheme `file://`. Paths with
+  other schemes are fetched from their appropriate locations.
+* Local jars provided by specifying the `--upload-jars` command line argument to `spark-submit`, or by setting
+  `spark.kubernetes.driver.uploads.jars` in the application's configuration, will be treated as jars that are located on
+  the *disk of the submitting machine*. These jars are uploaded to the driver docker container before executing the
+  application.
+* A main application resource path that does not have a scheme or that has the scheme `file://` is assumed to be on the
+  *disk of the submitting machine*. This resource is uploaded to the driver docker container before executing the
+  application. A remote path can still be specified and the resource will be fetched from the appropriate location.
+  
+Below are some examples of providing application dependencies.
+
+To submit an application with both the main resource and two other jars living on the submitting user's machine:
+
+    bin/spark-submit
+      --deploy-mode cluster
+      --class com.example.applications.SampleApplication
+      --master k8s://https://192.168.99.100
+      --kubernetes-namespace spark.kubernetes.namespace=default 
+      --upload-jars /home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar
+      --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest
+      --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
+      /home/exampleuser/exampleapplication/main.jar
+      
+Note that the above is equivalent to this command:
+
+    bin/spark-submit
+      --deploy-mode cluster
+      --class com.example.applications.SampleApplication
+      --master k8s://https://192.168.99.100
+      --kubernetes-namespace spark.kubernetes.namespace=default 
+      --conf spark.kubernetes.driver.uploads.jars=/home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar
+      --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest
+      --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
+      /home/exampleuser/exampleapplication/main.jar
+
+To specify a main application resource that can be downloaded from an HTTP service, and if a plugin for that application
+is located in the jar `/opt/spark-plugins/app-plugin.jar` on the docker image's disk:
+
+    bin/spark-submit 
+      --deploy-mode cluster 
+      --class com.example.applications.PluggableApplication
+      --master k8s://https://192.168.99.100
+      --kubernetes-namespace spark.kubernetes.namespace=default 
+      --jars /opt/spark-plugins/app-plugin.jar
+      --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest
+      --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
+      http://example.com:8080/applications/sparkpluggable/app.jar
+      
+Note that the above is equivalent to this command:
+
+    bin/spark-submit 
+      --deploy-mode cluster 
+      --class com.example.applications.PluggableApplication
+      --master k8s://https://192.168.99.100
+      --kubernetes-namespace spark.kubernetes.namespace=default 
+      --conf spark.jars=file:///opt/spark-plugins/app-plugin.jar
+      --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest
+      --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
+      http://example.com:8080/applications/sparkpluggable/app.jar
+      
+### Spark Properties
+
+Below are some other common properties that are specific to Kubernetes. Most of the other configurations are the same
+from the other deployment modes. See the [configuration page](configuration.html) for more information on those.
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+<tr>
+  <td><code>spark.kubernetes.namespace</code></td>
+  <td>(none)</td>
+  <td>
+    The namespace that will be used for running the driver and executor pods. Must be specified. When using
+    <code>spark-submit</code> in cluster mode, this can also be passed to <code>spark-submit></code> via the
+    <code>--kubernetes-namespace</code> command line argument.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.driver.docker.image</code></td>
+  <td><code>spark-driver:2.2.0</code></td>
+  <td>
+    Docker image to use for the driver. Specify this using the standard <a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.executor.docker.image</code></td>
+  <td><code>spark-executor:2.2.0</code></td>
+  <td>
+    Docker image to use for the executor. Specify this using the standard <a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.submit.caCertFile</code></td>
+  <td>(none)</td>
+  <td>
+    CA Cert file for connecting to Kubernetes over HTTPs.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.submit.clientKeyFile</code></td>
+  <td>(none)</td>
+  <td>
+    Client key file for authenticating against the Kubernetes API server.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.submit.clientCertFile</code></td>
+  <td>(none)</td>
+  <td>
+    Client cert file for authenticating against the Kubernetes API server.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.submit.serviceAccountName</code></td>
+  <td><code>default</code></td>
+  <td>
+    Service account that is used when running the driver pod. The driver pod uses this service account when requesting
+    executor pods from the API server.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.driver.uploads.jars</code></td>
+  <td>(none)</td>
+  <td>
+    Comma-separated list of jars to sent to the driver and all executors when submitting the application in cluster mode.
+    Refer to <a href="running-on-kubernetes.html#adding-other-jars">adding other jars</a> for more information.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.driver.uploads.driverExtraClasspath</code></td>
+  <td>(none)</td>
+  <td>
+    Comma-separated list of jars to be sent to the driver only when submitting the application in cluster mode. 
+  </td>
+</tr>
+<tr>
+  <td><code>spark.kubernetes.executor.memoryOverhead</code></td>
+  <td>executorMemory * 0.10, with minimum of 384 </td>
+  <td>
+    The amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
+  </td>
+</tr>
+</table>
+
+## Current Limitations
+
+Running Spark on Kubernetes is currently an experimental feature. Some restrictions on the current implementation that should be
+lifted in the future include:
+* Applications can only use a fixed number of executors. Dynamic allocation is not supported.
+* Applications can only run in cluster mode.
+* The external shuffle service cannot be used.
+* Only Scala and Java applications can be run.

From cb90f64f04122b5164c55fb2362e14e752e15bf2 Mon Sep 17 00:00:00 2001
From: mcheah <mcheah@palantir.com>
Date: Thu, 12 Jan 2017 17:37:38 -0800
Subject: [PATCH 2/5] Adding navigation links from other pages

---
 docs/_layouts/global.html | 1 +
 docs/index.md             | 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index c00d0db63cd10..3c786a6344066 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -99,6 +99,7 @@
                                 <li><a href="spark-standalone.html">Spark Standalone</a></li>
                                 <li><a href="running-on-mesos.html">Mesos</a></li>
                                 <li><a href="running-on-yarn.html">YARN</a></li>
+                                <li><a href="running-on-kubernetes.html">Kubernetes</a></li>
                             </ul>
                         </li>
 
diff --git a/docs/index.md b/docs/index.md
index 57b9fa848f4a3..81d37aa5f63a1 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -113,6 +113,7 @@ options for deployment:
   * [Mesos](running-on-mesos.html): deploy a private cluster using
       [Apache Mesos](http://mesos.apache.org)
   * [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN)
+  * [Kubernetes](running-on-kubernetes.html): deploy Spark on top of Kubernetes
 
 **Other Documents:**
 

From ab8436a9ea22cd41b2b278336e0967adb0156bf1 Mon Sep 17 00:00:00 2001
From: mcheah <mcheah@palantir.com>
Date: Fri, 13 Jan 2017 12:46:00 -0800
Subject: [PATCH 3/5] Address comments, add TODO for things that should be
 fixed

---
 docs/running-on-kubernetes.md | 67 +++++++++++++++++++++--------------
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 0478360d7a91e..af1fd4f9c5f34 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -4,11 +4,11 @@ title: Running Spark on Kubernetes
 ---
 
 Support for running on [Kubernetes](https://kubernetes.io/) is available in experimental status. The feature set is
-currently limited and not well-tested.
+currently limited and not well-tested. This should not be used in production environments.
 
 ## Setting Up Docker Images
 
-In order to run Spark on Kubernetes, a Docker image must be built and available on the Docker registry. Spark
+In order to run Spark on Kubernetes, a Docker image must be built and available on an accessible Docker registry. Spark
 distributions include the Docker files for the driver and the executor at `dockerfiles/driver/Dockerfile` and
 `docker/executor/Dockerfile`, respectively. Use these Docker files to build the Docker images, and then tag them with
 the registry that the images should be sent to. Finally, push the images to the registry.
@@ -24,7 +24,7 @@ For example, if the registry host is `registry-host` and the registry is listeni
 ## Submitting Applications to Kubernetes
 
 Kubernetes applications can be executed via `spark-submit`. For example, to compute the value of pi, assuming the
-docker images were set up as described above:
+Docker images were set up as described above:
 
     bin/spark-submit 
       --deploy-mode cluster 
@@ -37,32 +37,36 @@ docker images were set up as described above:
       --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
       examples/jars/spark_2.11-2.2.0.jar
 
+<!-- TODO master should default to https if no scheme is specified -->
 The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting
 `spark.master` in the application's configuration, must be a URL with the format `k8s://<api_server_url`. Prefixing the
 master string with `k8s://` will cause the Spark application to launch on a Kubernetes cluster, where the API server is
 contacted  at the appropriate inner URL. The HTTP protocol must also be specified.
 
-Note that applications can currently only be executed in cluster mode.
+Note that applications can currently only be executed in cluster mode, where the driver and its executors are running on
+the cluster.
  
 ### Adding Other JARs
  
-Spark allows users to provide dependencies that live on the driver's docker image, or that are on the local disk of the
-submitter's machine. These dependencies are also sent to the executors. These two types of dependencies are specified
-via different configuration options to `spark-submit`.
+Spark allows users to provide dependencies that are bundled into the driver's Docker image, or that are on the local
+disk of the submitter's machine. These two types of dependencies are specified via different configuration options to
+`spark-submit`:
  
 * Local jars provided by specifying the `--jars` command line argument to `spark-submit`, or by setting `spark.jars` in
-  the application's configuration, will be treated as jars that are located on the *disk of the driver docker
+  the application's configuration, will be treated as jars that are located on the *disk of the driver Docker
   container*. This only applies to jar paths that do not specify a scheme or that have the scheme `file://`. Paths with
   other schemes are fetched from their appropriate locations.
 * Local jars provided by specifying the `--upload-jars` command line argument to `spark-submit`, or by setting
   `spark.kubernetes.driver.uploads.jars` in the application's configuration, will be treated as jars that are located on
   the *disk of the submitting machine*. These jars are uploaded to the driver docker container before executing the
   application.
+  <!-- TODO support main resource bundled in the Docker image -->
 * A main application resource path that does not have a scheme or that has the scheme `file://` is assumed to be on the
   *disk of the submitting machine*. This resource is uploaded to the driver docker container before executing the
   application. A remote path can still be specified and the resource will be fetched from the appropriate location.
   
-Below are some examples of providing application dependencies.
+In all of these cases, the jars are placed on the driver's classpath, and are also sent to the executors. Below are some
+examples of providing application dependencies.
 
 To submit an application with both the main resource and two other jars living on the submitting user's machine:
 
@@ -70,19 +74,20 @@ To submit an application with both the main resource and two other jars living o
       --deploy-mode cluster
       --class com.example.applications.SampleApplication
       --master k8s://https://192.168.99.100
-      --kubernetes-namespace spark.kubernetes.namespace=default 
+      --kubernetes-namespace default 
       --upload-jars /home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar
       --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest
       --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
       /home/exampleuser/exampleapplication/main.jar
       
-Note that the above is equivalent to this command:
+Note that since passing the jars through the `--upload-jars` command line argument is equivalent to setting the
+`spark.kubernetes.driver.uploads.jars` Spark property, the above will behave identically to this command:
 
     bin/spark-submit
       --deploy-mode cluster
       --class com.example.applications.SampleApplication
       --master k8s://https://192.168.99.100
-      --kubernetes-namespace spark.kubernetes.namespace=default 
+      --kubernetes-namespace default 
       --conf spark.kubernetes.driver.uploads.jars=/home/exampleuser/exampleapplication/dep1.jar,/home/exampleuser/exampleapplication/dep2.jar
       --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver:latest
       --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
@@ -95,19 +100,20 @@ is located in the jar `/opt/spark-plugins/app-plugin.jar` on the docker image's
       --deploy-mode cluster 
       --class com.example.applications.PluggableApplication
       --master k8s://https://192.168.99.100
-      --kubernetes-namespace spark.kubernetes.namespace=default 
+      --kubernetes-namespace default 
       --jars /opt/spark-plugins/app-plugin.jar
       --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest
       --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
       http://example.com:8080/applications/sparkpluggable/app.jar
       
-Note that the above is equivalent to this command:
+Note that since passing the jars through the `--jars` command line argument is equivalent to setting the `spark.jars`
+Spark property, the above will behave identically to this command:
 
     bin/spark-submit 
       --deploy-mode cluster 
       --class com.example.applications.PluggableApplication
       --master k8s://https://192.168.99.100
-      --kubernetes-namespace spark.kubernetes.namespace=default 
+      --kubernetes-namespace default 
       --conf spark.jars=file:///opt/spark-plugins/app-plugin.jar
       --conf spark.kubernetes.driver.docker.image=registry-host:5000/spark-driver-custom:latest
       --conf spark.kubernetes.executor.docker.image=registry-host:5000/spark-executor:latest
@@ -122,10 +128,11 @@ from the other deployment modes. See the [configuration page](configuration.html
 <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
 <tr>
   <td><code>spark.kubernetes.namespace</code></td>
+  <!-- TODO set default to "default" -->
   <td>(none)</td>
   <td>
     The namespace that will be used for running the driver and executor pods. Must be specified. When using
-    <code>spark-submit</code> in cluster mode, this can also be passed to <code>spark-submit></code> via the
+    <code>spark-submit</code> in cluster mode, this can also be passed to <code>spark-submit</code> via the
     <code>--kubernetes-namespace</code> command line argument.
   </td>
 </tr>
@@ -133,35 +140,39 @@ from the other deployment modes. See the [configuration page](configuration.html
   <td><code>spark.kubernetes.driver.docker.image</code></td>
   <td><code>spark-driver:2.2.0</code></td>
   <td>
-    Docker image to use for the driver. Specify this using the standard <a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
+    Docker image to use for the driver. Specify this using the standard
+    <a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
   </td>
 </tr>
 <tr>
   <td><code>spark.kubernetes.executor.docker.image</code></td>
   <td><code>spark-executor:2.2.0</code></td>
   <td>
-    Docker image to use for the executor. Specify this using the standard <a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
+    Docker image to use for the executors. Specify this using the standard
+    <a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
   </td>
 </tr>
 <tr>
   <td><code>spark.kubernetes.submit.caCertFile</code></td>
   <td>(none)</td>
   <td>
-    CA Cert file for connecting to Kubernetes over HTTPs.
+    CA cert file for connecting to Kubernetes over SSL. This file should be located on the submitting machine's disk.
   </td>
 </tr>
 <tr>
   <td><code>spark.kubernetes.submit.clientKeyFile</code></td>
   <td>(none)</td>
   <td>
-    Client key file for authenticating against the Kubernetes API server.
+    Client key file for authenticating against the Kubernetes API server. This file should be located on the submitting
+    machine's disk.
   </td>
 </tr>
 <tr>
   <td><code>spark.kubernetes.submit.clientCertFile</code></td>
   <td>(none)</td>
   <td>
-    Client cert file for authenticating against the Kubernetes API server.
+    Client cert file for authenticating against the Kubernetes API server. This file should be located on the submitting
+    machine's disk.
   </td>
 </tr>
 <tr>
@@ -176,11 +187,12 @@ from the other deployment modes. See the [configuration page](configuration.html
   <td><code>spark.kubernetes.driver.uploads.jars</code></td>
   <td>(none)</td>
   <td>
-    Comma-separated list of jars to sent to the driver and all executors when submitting the application in cluster mode.
-    Refer to <a href="running-on-kubernetes.html#adding-other-jars">adding other jars</a> for more information.
+    Comma-separated list of jars to sent to the driver and all executors when submitting the application in cluster
+    mode. Refer to <a href="running-on-kubernetes.html#adding-other-jars">adding other jars</a> for more information.
   </td>
 </tr>
 <tr>
+  <!-- TODO remove this functionality -->
   <td><code>spark.kubernetes.driver.uploads.driverExtraClasspath</code></td>
   <td>(none)</td>
   <td>
@@ -191,16 +203,17 @@ from the other deployment modes. See the [configuration page](configuration.html
   <td><code>spark.kubernetes.executor.memoryOverhead</code></td>
   <td>executorMemory * 0.10, with minimum of 384 </td>
   <td>
-    The amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
+    The amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things
+    like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size
+    (typically 6-10%).
   </td>
 </tr>
 </table>
 
 ## Current Limitations
 
-Running Spark on Kubernetes is currently an experimental feature. Some restrictions on the current implementation that should be
-lifted in the future include:
+Running Spark on Kubernetes is currently an experimental feature. Some restrictions on the current implementation that
+should be lifted in the future include:
 * Applications can only use a fixed number of executors. Dynamic allocation is not supported.
 * Applications can only run in cluster mode.
-* The external shuffle service cannot be used.
 * Only Scala and Java applications can be run.

From 6976b81798e81751db7b8442b958aac5d627418c Mon Sep 17 00:00:00 2001
From: mcheah <mcheah@palantir.com>
Date: Fri, 13 Jan 2017 13:23:44 -0800
Subject: [PATCH 4/5] Address comments, mostly making images section clearer

---
 docs/running-on-kubernetes.md | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index af1fd4f9c5f34..4567a6073fe63 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -8,10 +8,15 @@ currently limited and not well-tested. This should not be used in production env
 
 ## Setting Up Docker Images
 
-In order to run Spark on Kubernetes, a Docker image must be built and available on an accessible Docker registry. Spark
-distributions include the Docker files for the driver and the executor at `dockerfiles/driver/Dockerfile` and
-`docker/executor/Dockerfile`, respectively. Use these Docker files to build the Docker images, and then tag them with
-the registry that the images should be sent to. Finally, push the images to the registry.
+Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
+be run in a virtual runtime environment that Kubernetes supports. Docker is a virtual runtime environment that is
+frequently used with Kubernetes, so Spark provides some support for working with Docker to get started quickly.
+
+To use Spark on Kubernetes with Docker, images for the driver and the executors need to built and published to an
+accessible Docker registry. Spark distributions include the Docker files for the driver and the executor at
+`dockerfiles/driver/Dockerfile` and `docker/executor/Dockerfile`, respectively. Use these Docker files to build the
+Docker images, and then tag them with the registry that the images should be sent to. Finally, push the images to the
+registry.
 
 For example, if the registry host is `registry-host` and the registry is listening on port 5000:
 
@@ -23,8 +28,8 @@ For example, if the registry host is `registry-host` and the registry is listeni
     
 ## Submitting Applications to Kubernetes
 
-Kubernetes applications can be executed via `spark-submit`. For example, to compute the value of pi, assuming the
-Docker images were set up as described above:
+Kubernetes applications can be executed via `spark-submit`. For example, to compute the value of pi, assuming the images
+are set up as described above:
 
     bin/spark-submit 
       --deploy-mode cluster 
@@ -39,9 +44,9 @@ Docker images were set up as described above:
 
 <!-- TODO master should default to https if no scheme is specified -->
 The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting
-`spark.master` in the application's configuration, must be a URL with the format `k8s://<api_server_url`. Prefixing the
-master string with `k8s://` will cause the Spark application to launch on a Kubernetes cluster, where the API server is
-contacted  at the appropriate inner URL. The HTTP protocol must also be specified.
+`spark.master` in the application's configuration, must be a URL with the format `k8s://<api_server_url>`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. The HTTP protocol must also be specified.
 
 Note that applications can currently only be executed in cluster mode, where the driver and its executors are running on
 the cluster.

From 420111b5112b4fe2de2248e9f557d9334f3c7a1b Mon Sep 17 00:00:00 2001
From: mcheah <mcheah@palantir.com>
Date: Fri, 13 Jan 2017 14:08:47 -0800
Subject: [PATCH 5/5] Virtual runtime -> container runtime

---
 docs/running-on-kubernetes.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 4567a6073fe63..5192d9d086618 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -9,7 +9,7 @@ currently limited and not well-tested. This should not be used in production env
 ## Setting Up Docker Images
 
 Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
-be run in a virtual runtime environment that Kubernetes supports. Docker is a virtual runtime environment that is
+be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
 frequently used with Kubernetes, so Spark provides some support for working with Docker to get started quickly.
 
 To use Spark on Kubernetes with Docker, images for the driver and the executors need to built and published to an