@@ -37,13 +37,14 @@ which allows Dask to take full advantage of multiple processors available on
3737most modern computers.
3838
3939For more details on Dask, read `its documentation <http://dask.pydata.org/ >`__.
40+ Note that xarray only makes use of ``dask.array `` and ``dask.delayed ``.
4041
4142.. _dask.io :
4243
4344Reading and writing data
4445------------------------
4546
46- The usual way to create a dataset filled with Dask arrays is to load the
47+ The usual way to create a `` Dataset `` filled with Dask arrays is to load the
4748data from a netCDF file or files. You can do this by supplying a ``chunks ``
4849argument to :py:func: `~xarray.open_dataset ` or using the
4950:py:func: `~xarray.open_mfdataset ` function.
@@ -71,8 +72,8 @@ argument to :py:func:`~xarray.open_dataset` or using the
7172
7273 In this example ``latitude `` and ``longitude `` do not appear in the ``chunks ``
7374dict, so only one chunk will be used along those dimensions. It is also
74- entirely equivalent to opening a dataset using `` open_dataset `` and then
75- chunking the data using the ``chunk `` method, e.g.,
75+ entirely equivalent to opening a dataset using :py:meth: ` ~xarray. open_dataset `
76+ and then chunking the data using the ``chunk `` method, e.g.,
7677``xr.open_dataset('example-data.nc').chunk({'time': 10}) ``.
7778
7879To open multiple files simultaneously, use :py:func: `~xarray.open_mfdataset `::
@@ -81,11 +82,14 @@ To open multiple files simultaneously, use :py:func:`~xarray.open_mfdataset`::
8182
8283This function will automatically concatenate and merge dataset into one in
8384the simple cases that it understands (see :py:func: `~xarray.auto_combine `
84- for the full disclaimer). By default, `` open_mfdataset ` ` will chunk each
85+ for the full disclaimer). By default, :py:meth: ` ~xarray. open_mfdataset ` will chunk each
8586netCDF file into a single Dask array; again, supply the ``chunks `` argument to
8687control the size of the resulting Dask arrays. In more complex cases, you can
87- open each file individually using ``open_dataset `` and merge the result, as
88- described in :ref: `combining data `.
88+ open each file individually using :py:meth: `~xarray.open_dataset ` and merge the result, as
89+ described in :ref: `combining data `. If you have a distributed cluster running,
90+ passing the keyword argument ``parallel=True `` to :py:meth: `~xarray.open_mfdataset `
91+ will speed up the reading of large multi-file datasets by executing those read tasks
92+ in parallel using ``dask.delayed ``.
8993
9094You'll notice that printing a dataset still shows a preview of array values,
9195even if they are actually Dask arrays. We can do this quickly with Dask because
@@ -105,7 +109,7 @@ usual way.
105109 ds.to_netcdf(' manipulated-example-data.nc' )
106110
107111 By setting the ``compute `` argument to ``False ``, :py:meth: `~xarray.Dataset.to_netcdf `
108- will return a Dask delayed object that can be computed later.
112+ will return a `` dask. delayed`` object that can be computed later.
109113
110114.. ipython :: python
111115
@@ -146,7 +150,7 @@ enable label based indexing, xarray will automatically load coordinate labels
146150into memory.
147151
148152The easiest way to convert an xarray data structure from lazy Dask arrays into
149- eager, in-memory NumPy arrays is to use the :py:meth: `~xarray.Dataset.load ` method:
153+ * eager * , in-memory NumPy arrays is to use the :py:meth: `~xarray.Dataset.load ` method:
150154
151155.. ipython :: python
152156
@@ -189,6 +193,7 @@ across your machines and be much faster to use than reading repeatedly from
189193disk.
190194
191195.. warning ::
196+
192197 On a single machine :py:meth: `~xarray.Dataset.persist ` will try to load all of
193198 your data into memory. You should make sure that your dataset is not larger than
194199 available memory.
0 commit comments