Skip to content

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Jul 5, 2021

  • Tests added
  • Passes pre-commit run --all-files
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

From 7s to 25 ms and 3.5GB to 850MB memory usage =) by passing the coordinate locations directly to the sparse constructor.

asv run -e --bench unstacking.UnstackingSparse.time_unstack_to_sparse  --cpu-affinity=3 HEAD
[  0.00%] · For xarray commit c9251e1c <sparse-unstack>:
[  0.00%] ·· Building for conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse
[  0.00%] ·· Benchmarking conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse
[  0.01%] ··· Running (unstacking.UnstackingSparse.time_unstack_to_sparse_2d--)..
[  0.02%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_2d    623±30μs
[  0.02%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_3d    22.8±2ms
[  0.06%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_2d    793M
[  0.06%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d    794M


[  0.04%] · For xarray commit 80905135 <main>:
[  0.04%] ·· Building for conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse..
[  0.04%] ·· Benchmarking conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse
[  0.05%] ··· Running (unstacking.UnstackingSparse.time_unstack_to_sparse_2d--)..
[  0.06%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_2d    596±30ms
[  0.06%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_3d    7.72±0.1s
[  0.02%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_2d    867M
[  0.02%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d    3.56G

cc @bonnland

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2021

Unit Test Results

         6 files           6 suites   53m 48s ⏱️
16 281 tests 14 545 ✔️ 1 736 💤 0
90 882 runs  82 702 ✔️ 8 180 💤 0

Results for commit 267a14f.

♻️ This comment has been updated with latest results.

@dcherian dcherian added the topic-arrays related to flexible array support label Jul 5, 2021
@max-sixty
Copy link
Collaborator

From 7s to 25 ms

Casual!

dcherian and others added 5 commits July 7, 2021 09:21
* upstream/main: (34 commits)
  Use same bool validator as other inputs (pydata#5703)
  conditionally disable bottleneck (pydata#5560)
  Refactor index vs. coordinate variable(s) (pydata#5636)
  pre-commit: autoupdate hook versions (pydata#5685)
  Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670)
  Speed up _mapping_repr (pydata#5661)
  update the link to `scipy`'s intersphinx file (pydata#5665)
  Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663)
  pre-commit: autoupdate hook versions (pydata#5660)
  fix the binder environment (pydata#5650)
  Update api.rst (pydata#5639)
  Kwargs to rasterio open (pydata#5609)
  Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633)
  new blank whats-new for v0.19.1
  v0.19.0 release notes (pydata#5632)
  remove deprecations scheduled for 0.19 (pydata#5630)
  Make typing-extensions optional (pydata#5624)
  Plots get labels from pint arrays (pydata#5561)
  Add to_numpy() and as_numpy() methods (pydata#5568)
  pin fsspec (pydata#5627)
  ...
@dcherian dcherian added the run-benchmark Run the ASV benchmark workflow label Oct 28, 2021
@Illviljan
Copy link
Contributor

       before           after         ratio
     [36f05d70]       [0310ebec]
-           2.98G             204M     0.07  unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-              3G             204M     0.07  unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-      10.2±0.02s         29.7±2ms     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-      10.1±0.05s       27.4±0.6ms     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-        714±20ms         945±30μs     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_2d [fv-az292-755/conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-         721±8ms         923±30μs     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_2d [fv-az292-755/conda-py3.8-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]

Quite the improvement indeed. :)

* upstream/main: (39 commits)
  Fixed a mispelling of dimension in dataarray documentation for from_dict (pydata#6020)
  [pre-commit.ci] pre-commit autoupdate (pydata#6014)
  [pre-commit.ci] pre-commit autoupdate (pydata#5990)
  Use set_options for asv bottleneck tests (pydata#5986)
  Fix module name retrieval in `backend.plugins.remove_duplicates()`, plugin tests (pydata#5959)
  Check for py version instead of try/except when importing entry_points (pydata#5988)
  Add "see also" in to_dataframe docs (pydata#5978)
  Alternate method using inline css to hide regular html output in an untrusted notebook (pydata#5880)
  Fix mypy issue with entry_points (pydata#5979)
  Remove pre-commit auto update (pydata#5958)
  Do not change coordinate inplace when throwing error (pydata#5957)
  Create CITATION.cff (pydata#5956)
  Add groupby & resample benchmarks (pydata#5922)
  Fix plot.line crash for data of shape (1, N) in _title_for_slice on format_item (pydata#5948)
  Disable unit test comments (pydata#5946)
  Publish test results from workflow_run only (pydata#5947)
  Generator for groupby reductions (pydata#5871)
  whats-new dev
  whats-new for 0.20.1 (pydata#5943)
  Docs: fix URL for PTSA (pydata#5935)
  ...
@dcherian dcherian added the plan to merge Final call for comments label Nov 24, 2021
@dcherian
Copy link
Contributor Author

dcherian commented Dec 2, 2021

@pydata/xarray I'm planning to merge on Friday. It's been sitting around for a while and is a giant improvement.

* upstream/main:
  fix grammatical typo in docs (pydata#6034)
  Use condas dask-core in ci instead of dask to speedup ci and reduce dependencies (pydata#6007)
  Use complex nan by default when interpolating out of bounds (pydata#6019)
  Simplify missing value handling in xarray.corr (pydata#6025)
  Add pyXpcm to Related Projects doc page (pydata#6031)
  Make xr.corr and xr.map_blocks work without dask (pydata#5731)
@dcherian dcherian merged commit cdfcf37 into pydata:main Dec 3, 2021
@dcherian dcherian deleted the sparse-unstack branch December 3, 2021 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs review plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-arrays related to flexible array support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants