You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Merge of #13599 ("virtualenv in pyspark", Bug SPARK-13587)
- and #5408 (wheel package support for Pyspark", bug SPARK-6764)
- Documentation updated
- only Standalone and YARN supported. Mesos not supported
- only tested with virtualenv/pip. Conda not tested
- client deployment + pip install w/ index: ok (1 min 30 exec)
- client deployment + wheelhouse w/o index: ko
(cffi refuse the builded wheel)
Signed-off-by: Gaetan Semet <[email protected]>
Copy file name to clipboardExpand all lines: docs/programming-guide.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ along with if you launch Spark's interactive shell -- either `bin/spark-shell` f
24
24
25
25
<divdata-lang="scala"markdown="1">
26
26
27
-
Spark {{site.SPARK_VERSION}} is built and distributed to work with Scala {{site.SCALA_BINARY_VERSION}}
27
+
Spark {{site.SPARK_VERSION}} is built and distributed to work with Scala {{site.SCALA_BINARY_VERSION}}
28
28
by default. (Spark can be built to work with other versions of Scala, too.) To write
29
29
applications in Scala, you will need to use a compatible Scala version (e.g. {{site.SCALA_BINARY_VERSION}}.X).
30
30
@@ -211,7 +211,7 @@ For a complete list of options, run `spark-shell --help`. Behind the scenes,
211
211
212
212
In the PySpark shell, a special interpreter-aware SparkContext is already created for you, in the
213
213
variable called `sc`. Making your own SparkContext will not work. You can set which master the
214
-
context connects to using the `--master` argument, and you can add Python .zip, .egg or .py files
214
+
context connects to using the `--master` argument, and you can add Python .zip, .egg, .whl or .py files
215
215
to the runtime path by passing a comma-separated list to `--py-files`. You can also add dependencies
216
216
(e.g. Spark Packages) to your shell session by supplying a comma-separated list of maven coordinates
217
217
to the `--packages` argument. Any additional repositories where dependencies might exist (e.g. SonaType)
@@ -240,13 +240,13 @@ use IPython, set the `PYSPARK_DRIVER_PYTHON` variable to `ipython` when running
240
240
$ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark
241
241
{% endhighlight %}
242
242
243
-
To use the Jupyter notebook (previously known as the IPython notebook),
243
+
To use the Jupyter notebook (previously known as the IPython notebook),
244
244
245
245
{% highlight bash %}
246
246
$ PYSPARK_DRIVER_PYTHON=jupyter ./bin/pyspark
247
247
{% endhighlight %}
248
248
249
-
You can customize the `ipython` or `jupyter` commands by setting `PYSPARK_DRIVER_PYTHON_OPTS`.
249
+
You can customize the `ipython` or `jupyter` commands by setting `PYSPARK_DRIVER_PYTHON_OPTS`.
250
250
251
251
After the Jupyter Notebook server is launched, you can create a new "Python 2" notebook from
252
252
the "Files" tab. Inside the notebook, you can input the command `%pylab inline` as part of
@@ -812,7 +812,7 @@ The variables within the closure sent to each executor are now copies and thus,
812
812
813
813
In local mode, in some circumstances the `foreach` function will actually execute within the same JVM as the driver and will reference the same original **counter**, and may actually update it.
814
814
815
-
To ensure well-defined behavior in these sorts of scenarios one should use an [`Accumulator`](#accumulators). Accumulators in Spark are used specifically to provide a mechanism for safely updating a variable when execution is split up across worker nodes in a cluster. The Accumulators section of this guide discusses these in more detail.
815
+
To ensure well-defined behavior in these sorts of scenarios one should use an [`Accumulator`](#accumulators). Accumulators in Spark are used specifically to provide a mechanism for safely updating a variable when execution is split up across worker nodes in a cluster. The Accumulators section of this guide discusses these in more detail.
816
816
817
817
In general, closures - constructs like loops or locally defined methods, should not be used to mutate some global state. Spark does not define or guarantee the behavior of mutations to objects referenced from outside of closures. Some code that does this may work in local mode, but that's just by accident and such code will not behave as expected in distributed mode. Use an Accumulator instead if some global aggregation is needed.
818
818
@@ -1231,8 +1231,8 @@ storage levels is:
1231
1231
</tr>
1232
1232
</table>
1233
1233
1234
-
**Note:***In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library,
1235
-
so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`,
1234
+
**Note:***In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library,
1235
+
so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`,
1236
1236
`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, and `DISK_ONLY_2`.*
1237
1237
1238
1238
Spark also automatically persists some intermediate data in shuffle operations (e.g. `reduceByKey`), even without users calling `persist`. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call `persist` on the resulting RDD if they plan to reuse it.
@@ -1374,7 +1374,7 @@ res2: Long = 10
1374
1374
1375
1375
While this code used the built-in support for accumulators of type Long, programmers can also
1376
1376
create their own types by subclassing [AccumulatorV2](api/scala/index.html#org.apache.spark.AccumulatorV2).
1377
-
The AccumulatorV2 abstract class has several methods which need to override:
1377
+
The AccumulatorV2 abstract class has several methods which need to override:
1378
1378
`reset` for resetting the accumulator to zero, and `add` for add anothor value into the accumulator, `merge` for merging another same-type accumulator into this one. Other methods need to override can refer to scala API document. For example, supposing we had a `MyVector` class
1379
1379
representing mathematical vectors, we could write:
0 commit comments