Faster unstacking to sparse #5577

dcherian · 2021-07-05T17:20:59Z

Tests added
Passes pre-commit run --all-files
User visible changes (including notable bug fixes) are documented in whats-new.rst

From 7s to 25 ms and 3.5GB to 850MB memory usage =) by passing the coordinate locations directly to the sparse constructor.

asv run -e --bench unstacking.UnstackingSparse.time_unstack_to_sparse  --cpu-affinity=3 HEAD
[  0.00%] · For xarray commit c9251e1c <sparse-unstack>:
[  0.00%] ·· Building for conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse
[  0.00%] ·· Benchmarking conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse
[  0.01%] ··· Running (unstacking.UnstackingSparse.time_unstack_to_sparse_2d--)..
[  0.02%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_2d    623±30μs
[  0.02%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_3d    22.8±2ms
[  0.06%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_2d    793M
[  0.06%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d    794M


[  0.04%] · For xarray commit 80905135 <main>:
[  0.04%] ·· Building for conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse..
[  0.04%] ·· Benchmarking conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse
[  0.05%] ··· Running (unstacking.UnstackingSparse.time_unstack_to_sparse_2d--)..
[  0.06%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_2d    596±30ms
[  0.06%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_3d    7.72±0.1s
[  0.02%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_2d    867M
[  0.02%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d    3.56G

cc @bonnland

xarray/core/variable.py

github-actions · 2021-07-05T17:28:09Z

Unit Test Results

        6 files         6 suites 53m 48s ⏱️
16 281 tests 14 545 ✔️ 1 736 💤 0 ❌
90 882 runs 82 702 ✔️ 8 180 💤 0 ❌

Results for commit 267a14f.

♻️ This comment has been updated with latest results.

max-sixty · 2021-07-05T18:37:06Z

From 7s to 25 ms

Casual!

xarray/core/variable.py

doc/whats-new.rst

* upstream/main: (34 commits) Use same bool validator as other inputs (pydata#5703) conditionally disable bottleneck (pydata#5560) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) ...

Illviljan · 2021-10-29T23:16:53Z

       before           after         ratio
     [36f05d70]       [0310ebec]
-           2.98G             204M     0.07  unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-              3G             204M     0.07  unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-      10.2±0.02s         29.7±2ms     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-      10.1±0.05s       27.4±0.6ms     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-        714±20ms         945±30μs     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_2d [fv-az292-755/conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-         721±8ms         923±30μs     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_2d [fv-az292-755/conda-py3.8-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]

Quite the improvement indeed. :)

xarray/core/variable.py

* upstream/main: (39 commits) Fixed a mispelling of dimension in dataarray documentation for from_dict (pydata#6020) [pre-commit.ci] pre-commit autoupdate (pydata#6014) [pre-commit.ci] pre-commit autoupdate (pydata#5990) Use set_options for asv bottleneck tests (pydata#5986) Fix module name retrieval in `backend.plugins.remove_duplicates()`, plugin tests (pydata#5959) Check for py version instead of try/except when importing entry_points (pydata#5988) Add "see also" in to_dataframe docs (pydata#5978) Alternate method using inline css to hide regular html output in an untrusted notebook (pydata#5880) Fix mypy issue with entry_points (pydata#5979) Remove pre-commit auto update (pydata#5958) Do not change coordinate inplace when throwing error (pydata#5957) Create CITATION.cff (pydata#5956) Add groupby & resample benchmarks (pydata#5922) Fix plot.line crash for data of shape (1, N) in _title_for_slice on format_item (pydata#5948) Disable unit test comments (pydata#5946) Publish test results from workflow_run only (pydata#5947) Generator for groupby reductions (pydata#5871) whats-new dev whats-new for 0.20.1 (pydata#5943) Docs: fix URL for PTSA (pydata#5935) ...

dcherian · 2021-12-02T01:50:12Z

@pydata/xarray I'm planning to merge on Friday. It's been sitting around for a while and is a giant improvement.

* upstream/main: fix grammatical typo in docs (pydata#6034) Use condas dask-core in ci instead of dask to speedup ci and reduce dependencies (pydata#6007) Use complex nan by default when interpolating out of bounds (pydata#6019) Simplify missing value handling in xarray.corr (pydata#6025) Add pyXpcm to Related Projects doc page (pydata#6031) Make xr.corr and xr.map_blocks work without dask (pydata#5731)

doc/whats-new.rst

Faster unstacking to sparse

9ac1e07

dcherian commented Jul 5, 2021

View reviewed changes

xarray/core/variable.py Outdated Show resolved Hide resolved

Update xarray/core/variable.py

6bd0fe7

dcherian added the needs review label Jul 5, 2021

[skip-ci] Add memory benchmarks

e976ada

dcherian added the topic-arrays related to flexible array support label Jul 5, 2021

max-sixty reviewed Jul 5, 2021

View reviewed changes

xarray/core/variable.py Outdated Show resolved Hide resolved

max-sixty reviewed Jul 5, 2021

View reviewed changes

xarray/core/variable.py Outdated Show resolved Hide resolved

cleanups + add comments

e4a6ec2

dcherian force-pushed the sparse-unstack branch from fa201bd to e4a6ec2 Compare July 5, 2021 20:45

optimize.

0c6f22f

dcherian mentioned this pull request Jul 6, 2021

Faster unstacking of dask arrays #5582

Open

bugfix

6e12955

dcherian commented Jul 7, 2021

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

dcherian and others added 5 commits July 7, 2021 09:21

[skip-ci] Update doc/whats-new.rst

8e6c548

clean up comments

637421d

FIx whats-new

58aa601

Merge branch 'main' into sparse-unstack

267a14f

dcherian added the run-benchmark Run the ASV benchmark workflow label Oct 28, 2021

Illviljan reviewed Nov 8, 2021

View reviewed changes

xarray/core/variable.py Show resolved Hide resolved

dcherian added 2 commits November 23, 2021 19:52

faster benchmarks

ea22454

dcherian added the plan to merge Final call for comments label Nov 24, 2021

make fewer assumptions

97e6915

Fix whats-new

b7017af

dcherian commented Dec 2, 2021

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

Update doc/whats-new.rst

1532c5e

dcherian force-pushed the sparse-unstack branch from 9adc72c to 1532c5e Compare December 2, 2021 02:16

dcherian merged commit cdfcf37 into pydata:main Dec 3, 2021

dcherian deleted the sparse-unstack branch December 3, 2021 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Faster unstacking to sparse #5577

Faster unstacking to sparse #5577

Uh oh!

dcherian commented Jul 5, 2021 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Jul 5, 2021 •

edited

Loading

Uh oh!

max-sixty commented Jul 5, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Illviljan commented Oct 29, 2021

Uh oh!

Uh oh!

dcherian commented Dec 2, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Faster unstacking to sparse #5577

Faster unstacking to sparse #5577

Uh oh!

Conversation

dcherian commented Jul 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

max-sixty commented Jul 5, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Illviljan commented Oct 29, 2021

Uh oh!

Uh oh!

dcherian commented Dec 2, 2021

Uh oh!

Uh oh!

Uh oh!

dcherian commented Jul 5, 2021 •

edited

Loading

github-actions bot commented Jul 5, 2021 •

edited

Loading