apache · ash211 · May 13, 2014 · May 13, 2014 · May 13, 2014 · May 13, 2014
diff --git a/docs/_config.yml b/docs/_config.yml
@@ -7,6 +7,6 @@ SPARK_VERSION: 1.0.0-SNAPSHOT
 SPARK_VERSION_SHORT: 1.0.0
 SCALA_BINARY_VERSION: "2.10"
 SCALA_VERSION: "2.10.4"
-MESOS_VERSION: 0.13.0
+MESOS_VERSION: 0.18.1
 SPARK_ISSUE_TRACKER_URL: https://issues.apache.org/jira/browse/SPARK
 SPARK_GITHUB_URL: https://github.com/apache/spark
diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md
@@ -3,19 +3,123 @@ layout: global
 title: Running Spark on Mesos
 ---
 
-Spark can run on clusters managed by [Apache Mesos](http://mesos.apache.org/). Follow the steps below to install Mesos and Spark:
-
-1. Download and build Spark using the instructions [here](index.html). **Note:** Don't forget to consider what version of HDFS you might want to use!
-2. Download, build, install, and start Mesos {{site.MESOS_VERSION}} on your cluster. You can download the Mesos distribution from a [mirror](http://www.apache.org/dyn/closer.cgi/mesos/{{site.MESOS_VERSION}}/). See the Mesos [Getting Started](http://mesos.apache.org/gettingstarted) page for more information. **Note:** If you want to run Mesos without installing it into the default paths on your system (e.g., if you don't have administrative privileges to install it), you should also pass the `--prefix` option to `configure` to tell it where to install. For example, pass `--prefix=/home/user/mesos`. By default the prefix is `/usr/local`.
-3. Create a Spark "distribution" using `make-distribution.sh`.
-4. Rename the `dist` directory created from `make-distribution.sh` to `spark-{{site.SPARK_VERSION}}`.
-5. Create a `tar` archive: `tar czf spark-{{site.SPARK_VERSION}}.tar.gz spark-{{site.SPARK_VERSION}}`
-6. Upload this archive to HDFS or another place accessible from Mesos via `http://`, e.g., [Amazon Simple Storage Service](http://aws.amazon.com/s3): `hadoop fs -put spark-{{site.SPARK_VERSION}}.tar.gz /path/to/spark-{{site.SPARK_VERSION}}.tar.gz`
-7. Create a file called `spark-env.sh` in Spark's `conf` directory, by copying `conf/spark-env.sh.template`, and add the following lines to it:
-   * `export MESOS_NATIVE_LIBRARY=<path to libmesos.so>`. This path is usually `<prefix>/lib/libmesos.so` (where the prefix is `/usr/local` by default, see above). Also, on Mac OS X, the library is called `libmesos.dylib` instead of `libmesos.so`.
-   * `export SPARK_EXECUTOR_URI=<path to spark-{{site.SPARK_VERSION}}.tar.gz uploaded above>`.
-   * `export MASTER=mesos://HOST:PORT` where HOST:PORT is the host and port (default: 5050) of your Mesos master (or `zk://...` if using Mesos with ZooKeeper).
-8. To run a Spark application against the cluster, when you create your `SparkContext`, pass the string `mesos://HOST:PORT` as the master URL. In addition, you'll need to set the `spark.executor.uri` property. For example:
+# Why Mesos
+
+Spark can run on hardware clusters managed by [Apache Mesos](http://mesos.apache.org/).
+
+The advantages of deploying Spark with Mesos include:
+- dynamic partitioning between Spark and other
+  [frameworks](https://mesos.apache.org/documentation/latest/mesos-frameworks/)
+- scalable partitioning between multiple instances of Spark
+
+# How it works
+
+In a standalone cluster deployment, the cluster manager in the below diagram is a Spark master
+instance.  When using Mesos, the Mesos master replaces the Spark master as the cluster manager.
+
+<p style="text-align: center;">
+  <img src="img/cluster-overview.png" title="Spark cluster components" alt="Spark cluster components" />
+</p>
+
+Now when a driver creates a job and starts issuing tasks for scheduling, Mesos determines what
+machines handle what tasks.  Because it takes into account other frameworks when scheduling these
+many short-lived tasks, multiple frameworks can coexist on the same cluster without resorting to a
+static partitioning of resources.
+
+To get started, follow the steps below to install Mesos and deploy Spark jobs via Mesos.
+
+
+# Installing Mesos
+
+Spark {{site.SPARK_VERSION}} is designed for use with Mesos {{site.MESOS_VERSION}} and does not
+require any special patches of Mesos.
+
+If you already have a Mesos cluster running, you can skip this Mesos installation step.
+
+Otherwise, installing Mesos for Spark is no different than installing Mesos for use by other
+frameworks.  You can install Mesos using either prebuilt packages or by compiling from source.
+
+## Prebuilt packages
+
+The Apache Mesos project only publishes source package releases, no binary releases.  But other
+third party projects publish binary releases that may be helpful in setting Mesos up.
+
+One of those is Mesosphere.  To install Mesos using the binary releases provided by Mesosphere:
+
+1. Download Mesos installation package from [downloads page](http://mesosphere.io/downloads/)
+2. Follow their instructions for installation and configuration
+
+The Mesosphere installation documents suggest setting up ZooKeeper to handle Mesos master failover,
+but Mesos can be run without ZooKeeper using a single master as well.
+
+## From source
+
+To install Mesos directly from the upstream project rather than a third party, install from source.
+
+1. Download the Mesos distribution from a
+   [mirror](http://www.apache.org/dyn/closer.cgi/mesos/{{site.MESOS_VERSION}}/)
+2. Follow the Mesos [Getting Started](http://mesos.apache.org/gettingstarted) page for compiling and
+   installing Mesos
+
+**Note:** If you want to run Mesos without installing it into the default paths on your system
+(e.g., if you lack administrative privileges to install it), you should also pass the
+`--prefix` option to `configure` to tell it where to install. For example, pass
+`--prefix=/home/user/mesos`. By default the prefix is `/usr/local`.
+
+## Verification
+
+To verify that the Mesos cluster is ready for Spark, navigate to the Mesos master webui at port
+`:5050`  Confirm that all expected machines are present in the slaves tab.
+
+
+# Connecting Spark to Mesos
+
+To use Mesos from Spark, you need a Spark distribution available in a place accessible by Mesos, and
+a Spark driver program configured to connect to Mesos.
+
+## Uploading Spark Distribution
+
+When Mesos runs a task on a Mesos slave for the first time, that slave must have a distribution of
+Spark available for running the Spark Mesos executor backend.  A distribution of Spark is just a
+compiled binary version of Spark.
+
+The Spark distribution can be hosted at any Hadoop URI, including HTTP via `http://`, [Amazon Simple
+Storage Service](http://aws.amazon.com/s3) via `s3://`, or HDFS via `hdfs:///`.
+
+To use a precompiled distribution:
+
+1. Download a Spark distribution from the Spark [download page](https://spark.apache.org/downloads.html)  
+2. Upload to hdfs/http/s3
+
+To host on HDFS, use the Hadoop fs put command: `hadoop fs -put spark-{{site.SPARK_VERSION}}.tar.gz
+/path/to/spark-{{site.SPARK_VERSION}}.tar.gz`
+
+
+Or if you are using a custom-compiled version of Spark, you will need to create a distribution using
+the `make-distribution.sh` script included in a Spark source tarball/checkout.
+
+1. Download and build Spark using the instructions [here](index.html)
+2. Create a Spark distribution using `make-distribution.sh --tgz`.
+3. Upload archive to http/s3/hdfs
+
+
+## Using a Mesos Master URL
+
+The Master URLs for Mesos are in the form `mesos://host:5050` for a single-master Mesos
+cluster, or `zk://host:2181` for a multi-master Mesos cluster using ZooKeeper.
+
+The driver also needs some configuration in `spark-env.sh` to interact properly with Mesos:
+
+1. In `spark.env.sh` set some environment variables:
+ * `export MESOS_NATIVE_LIBRARY=<path to libmesos.so>`. This path is typically
+   `<prefix>/lib/libmesos.so` where the prefix is `/usr/local` by default. See Mesos installation
+   instructions above. On Mac OS X, the library is called `libmesos.dylib` instead of
+   `libmesos.so`.
+ * `export SPARK_EXECUTOR_URI=<path to spark-{{site.SPARK_VERSION}}.tar.gz uploaded above>`.
+2. Also set `spark.executor.uri` to <path to spark-{{site.SPARK_VERSION}}.tar.gz>
+
+Now when starting a Spark application against the cluster, pass a `mesos://`
+or `zk://` URL as the master when creating a `SparkContext`. For example:
 
 {% highlight scala %}
 val conf = new SparkConf()
@@ -25,31 +129,73 @@ val conf = new SparkConf()
 val sc = new SparkContext(conf)
 {% endhighlight %}
 
+When running a shell the `spark.executor.uri` parameter is inherited from `SPARK_EXECUTOR_URI`, so
+it does not need to be redundantly passed in as a system property.
+
+{% highlight shell %}
+./bin/spark-shell --master mesos://host:5050
+{% endhighlight %}
+
+
 # Mesos Run Modes
 
-Spark can run over Mesos in two modes: "fine-grained" and "coarse-grained". In fine-grained mode, which is the default,
-each Spark task runs as a separate Mesos task. This allows multiple instances of Spark (and other frameworks) to share
-machines at a very fine granularity, where each application gets more or fewer machines as it ramps up, but it comes with an
-additional overhead in launching each task, which may be inappropriate for low-latency applications (e.g. interactive queries or serving web requests). The coarse-grained mode will instead
-launch only *one* long-running Spark task on each Mesos machine, and dynamically schedule its own "mini-tasks" within
-it. The benefit is much lower startup overhead, but at the cost of reserving the Mesos resources for the complete duration
-of the application.
+Spark can run over Mesos in two modes: "fine-grained" (default) and "coarse-grained".
+
+In "fine-grained" mode (default), each Spark task runs as a separate Mesos task. This allows
+multiple instances of Spark (and other frameworks) to share machines at a very fine granularity,
+where each application gets more or fewer machines as it ramps up and down, but it comes with an
+additional overhead in launching each task. This mode may be inappropriate for low-latency
+requirements like interactive queries or serving web requests.
+
+The "coarse-grained" mode will instead launch only *one* long-running Spark task on each Mesos
+machine, and dynamically schedule its own "mini-tasks" within it. The benefit is much lower startup
+overhead, but at the cost of reserving the Mesos resources for the complete duration of the
+application.
 
-To run in coarse-grained mode, set the `spark.mesos.coarse` property in your [SparkConf](configuration.html#spark-properties):
+To run in coarse-grained mode, set the `spark.mesos.coarse` property in your
+[SparkConf](configuration.html#spark-properties):
 
 {% highlight scala %}
 conf.set("spark.mesos.coarse", "true")
 {% endhighlight %}
 
-In addition, for coarse-grained mode, you can control the maximum number of resources Spark will acquire. By default,
-it will acquire *all* cores in the cluster (that get offered by Mesos), which only makes sense if you run just one
-application at a time. You can cap the maximum number of cores using `conf.set("spark.cores.max", "10")` (for example).
+In addition, for coarse-grained mode, you can control the maximum number of resources Spark will
+acquire. By default, it will acquire *all* cores in the cluster (that get offered by Mesos), which
+only makes sense if you run just one application at a time. You can cap the maximum number of cores
+using `conf.set("spark.cores.max", "10")` (for example).
 
 
 # Running Alongside Hadoop
 
-You can run Spark and Mesos alongside your existing Hadoop cluster by just launching them as a separate service on the machines. To access Hadoop data from Spark, just use a hdfs:// URL (typically `hdfs://<namenode>:9000/path`, but you can find the right URL on your Hadoop Namenode's web UI).
+You can run Spark and Mesos alongside your existing Hadoop cluster by just launching them as a
+separate service on the machines. To access Hadoop data from Spark, a full hdfs:// URL is required
+(typically `hdfs://<namenode>:9000/path`, but you can find the right URL on your Hadoop Namenode web
+UI).
+
+In addition, it is possible to also run Hadoop MapReduce on Mesos for better resource isolation and
+sharing between the two. In this case, Mesos will act as a unified scheduler that assigns cores to
+either Hadoop or Spark, as opposed to having them share resources via the Linux scheduler on each
+node. Please refer to [Hadoop on Mesos](https://github.com/mesos/hadoop).
+
+In either case, HDFS runs separately from Hadoop MapReduce, without being scheduled through Mesos.
+
+
+# Troubleshooting and Debugging
+
+A few places to look during debugging:
+
+- Mesos master on port `:5050`
+  - Slaves should appear in the slaves tab
+  - Spark applications should appear in the frameworks tab
+  - Tasks should appear in the details of a framework
+  - Check the stdout and stderr of the sandbox of failed tasks
+- Mesos logs
+  - Master and slave logs are both in `/var/log/mesos` by default
 
-In addition, it is possible to also run Hadoop MapReduce on Mesos, to get better resource isolation and sharing between the two. In this case, Mesos will act as a unified scheduler that assigns cores to either Hadoop or Spark, as opposed to having them share resources via the Linux scheduler on each node. Please refer to [Hadoop on Mesos](https://github.com/mesos/hadoop).
+And common pitfalls:
 
-In either case, HDFS runs separately from Hadoop MapReduce, without going through Mesos.
+- Spark assembly not reachable/accessible
+  - Slaves need to be able to download the distribution
+- Firewall blocking communications
+  - Check for messages about failed connections
+  - Temporarily disable firewalls for debugging and then poke appropriate holes