Avoid calling np.asarray on lazy indexing classes #6874

dcherian · 2022-08-03T15:13:00Z

This is motivated by https://docs.rapids.ai/api/kvikio/stable/api.html#kvikio.zarr.GDSStore which on read loads the data directly into GPU memory.

Currently we rely on np.asarray to convert a BackendArray wrapped with a number of lazy indexing classes to a real array but this breaks for GDSStore because the underlying array is a cupy array, so using np.asarray raises an error.

np.asarray will raise if a non-numpy array is returned so we need to use something else.

Here I added get_array which like np.array recurses down until it receives a duck array.

Quite a few things are broken I think , but I'd like feedback on the approach.

I considered np.asanyarray(..., like=...) but that would require the lazy indexing classes to know what they're wrapping which doesn't seem right.

Ref: xarray-contrib/cupy-xarray#10 which adds a kvikio backend entrypoint

This returns the underlying array type instead of always casting to np.array. This is necessary for Zarr stores where the Zarr Array wraps a cupy array (for example kvikio.zarr.GDSStoree). In that case, we cannot call np.asarray because __array__ is expected to always return a numpy array. We use get_array in Variable.data to make sure we don't load arrays from such GDSStores.

instead of always casting to np.asarray

for more information, see https://pre-commit.ci

shoyer · 2022-08-03T17:44:20Z

As I understand it, the main purpose here is to remove Xarray lazy indexing class.

Maybe call this get_duck_array(), just to be a little more descriptive?

Clean up short_array_repr.

xarray/core/indexing.py

xarray/core/common.py

xarray/core/indexing.py

dcherian · 2023-01-24T21:16:16Z

xarray/core/indexing.py

+        # so we need the explicit check for ExplicitlyIndexed
+        if isinstance(array, ExplicitlyIndexed):
+            array = array.get_duck_array()
+        return _wrap_numpy_scalars(array)


Adding _wrap_numpy_scalars allows us to handle scalars being returned by the backend. This seems OK to me in that we place fewer restrictions on the backend (and is backward compatible).

xarray/xarray/core/indexing.py

Lines 607 to 612 in 3ee7b5a

def _wrap_numpy_scalars(array):

"""Wrap NumPy scalars in 0d arrays."""

if np.isscalar(array):

return np.array(array)

else:

return array

But now the issue is that we should pass an appropriate like argument to np.array but I don't see how to that from a scalar array

Good news is that backends can avoid this complication by returning arrays, so we could just ignore this ugly bit for now.

xarray/backends/common.py

xarray/core/indexing.py

xarray/tests/test_dataset.py

xarray/tests/__init__.py

Co-authored-by: Illviljan <[email protected]> Co-authored-by: Stephan Hoyer <[email protected]>

Co-authored-by: Illviljan <[email protected]>

dcherian · 2023-02-16T19:52:38Z

@Illviljan feel free to push any typing changes to this PR. I think that would really help clarify the interface. I tried adding a DuckArray type but that didn't go to far.

Illviljan · 2023-02-16T20:34:16Z

I don't have a better idea than to do DuckArray = Any # ndarray/cupy/sparse etc. and add that as output, but that wont change anything mypy-wise besides making it easier for us to read the code.

dcherian · 2023-02-16T22:03:22Z

T_ExplicitlyIndexed may be a different thing to add

xarray/core/indexing.py

Co-authored-by: Illviljan <[email protected]>

for more information, see https://pre-commit.ci

* main: (40 commits) Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638) add timeouts for tests (pydata#7657) Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667) [pre-commit.ci] pre-commit autoupdate (pydata#7651) Allow all integer dtypes in `polyval` (pydata#7619) [skip-ci] dev whats-new (pydata#7660) Redo whats-new for 2023.03.0 (pydata#7659) Set copy=False when calling pd.Series (pydata#7642) Pin pandas < 2 (pydata#7650) Whats-new for release 2023.03.0 (pydata#7643) Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648) Use more descriptive link texts (pydata#7625) Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598) Fix `pcolormesh` with str coords (pydata#7612) [skip-ci] Fix groupby binary ops benchmarks (pydata#7603) Remove incomplete sentence in IO docs (pydata#7631) Allow indexing unindexed dimensions using dask arrays (pydata#5873) Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618) [pre-commit.ci] pre-commit autoupdate (pydata#7620) add a test for scatter colorbar extend (pydata#7616) ...

dcherian · 2023-03-26T19:58:05Z

I'd like to merge this at the end of next week.

It now has tests and should be backwards compatible with external backends.

A good next step would be to finish up #7020

dcherian and others added 6 commits August 3, 2022 09:08

Rename to short_array_repr; use Variable.data

9c0350c

instead of always casting to np.asarray

Fix Variable.load

74afa53

Make get_array recursive.

9de7427

Some cleanups

cc0a653

[pre-commit.ci] auto fixes from pre-commit.com hooks

59c7ead

for more information, see https://pre-commit.ci

dcherian mentioned this pull request Aug 3, 2022

Add Kvikio backend entrypoint xarray-contrib/cupy-xarray#10

Closed

8 tasks

Add get_array to PandasIndexingAdaptor

2aa0830

dcherian added 3 commits August 3, 2022 11:50

Finish short_array_repr refactoring

1306758

Rename to get_duck_array

cf67972

Try without hasattr check

0209900

dcherian changed the title ~~Avoid calling np.asarray on backend arrays~~ Avoid calling np.asarray on lazy indexing classes Aug 3, 2022

dcherian added 3 commits August 3, 2022 12:23

Return bare array from LazilyIndexedArray.get_duck_array

536648a

Add get_duck_array to AbstractArray

f2514c7

Clean up short_array_repr.

Fix zerodim test

3c597d4

dcherian commented Aug 3, 2022

View reviewed changes

xarray/core/indexing.py Outdated Show resolved Hide resolved

dcherian marked this pull request as ready for review August 3, 2022 20:57

dcherian requested a review from shoyer August 3, 2022 20:58

dcherian added 2 commits August 5, 2022 15:42

Fix LazilyVectorizedIndexedArray

201eeba

Inherit __array__ from ExplicitlyIndexed

cd02a8a

dcherian force-pushed the kvikio branch from 0753803 to cd02a8a Compare August 5, 2022 21:48

dcherian and others added 4 commits August 5, 2022 15:59

Fix InaccessibleArray in tests

7ef55e0

Fix BackendArray

4e77fec

Merge branch 'main' into kvikio

d14c61f

Merge branch 'main' into kvikio

19af950

shoyer reviewed Aug 10, 2022

View reviewed changes

xarray/core/common.py Outdated Show resolved Hide resolved

xarray/core/indexing.py Outdated Show resolved Hide resolved

dcherian added 3 commits August 10, 2022 11:19

reprs Use .data on AbstractArray

22db817

Force netCDF and h5netCDF to return arrays

2bbcc16

Add whats-new

ca2a10a

dcherian added 3 commits January 20, 2023 16:26

Guard np.asarray for scalars.

9815b75

Revert casting to arrays in backend

39e7529

Wrap numpy scalars in Explicitly*Indexed*.get_duck_aray

f304bcb

dcherian commented Jan 24, 2023

View reviewed changes

dcherian added 2 commits January 26, 2023 10:01

Merge branch 'main' into kvikio

6cb1677

Merge branch 'main' into kvikio

2c7da96

Illviljan reviewed Feb 16, 2023

View reviewed changes

dcherian and others added 2 commits February 16, 2023 12:51

Apply suggestions from code review

26d224c

Co-authored-by: Illviljan <[email protected]> Co-authored-by: Stephan Hoyer <[email protected]>

Update xarray/tests/__init__.py

65da209

Co-authored-by: Illviljan <[email protected]>

dcherian commented Mar 3, 2023

View reviewed changes

xarray/core/indexing.py Outdated Show resolved Hide resolved

dcherian and others added 9 commits March 3, 2023 16:49

Update xarray/core/indexing.py

0bc1175

Apply suggestions from code review

8c2d74c

Co-authored-by: Illviljan <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

20c8c81

for more information, see https://pre-commit.ci

Bring back the ugly check

77f7059

Update whats-new

5c23bd2

Fix pre-commit

887e1c5

silence mypy error

2557d02

minimize diff

b313258

dcherian added the plan to merge Final call for comments label Mar 26, 2023

Merge branch 'main' into kvikio

cbd030e

Illviljan mentioned this pull request Mar 31, 2023

cf-coding #7654

Closed

4 tasks

dcherian merged commit aa4361d into pydata:main Mar 31, 2023

dcherian deleted the kvikio branch March 31, 2023 15:15

TomNicholas mentioned this pull request Feb 6, 2024

Only use CopyOnWriteArray wrapper on BackendArrays #8712

Open

4 tasks

dcherian mentioned this pull request May 30, 2025

Clean up backend indexing some more #10376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Avoid calling np.asarray on lazy indexing classes #6874

Avoid calling np.asarray on lazy indexing classes #6874

Uh oh!

dcherian commented Aug 3, 2022 •

edited

Loading

Uh oh!

shoyer commented Aug 3, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcherian Jan 24, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcherian commented Feb 16, 2023

Uh oh!

Illviljan commented Feb 16, 2023

Uh oh!

dcherian commented Feb 16, 2023

Uh oh!

Uh oh!

dcherian commented Mar 26, 2023

Uh oh!

Uh oh!

	def _wrap_numpy_scalars(array):
	"""Wrap NumPy scalars in 0d arrays."""
	if np.isscalar(array):
	return np.array(array)
	else:
	return array

Uh oh!

Avoid calling np.asarray on lazy indexing classes #6874

Avoid calling np.asarray on lazy indexing classes #6874

Uh oh!

Conversation

dcherian commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer commented Aug 3, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcherian Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcherian commented Feb 16, 2023

Uh oh!

Illviljan commented Feb 16, 2023

Uh oh!

dcherian commented Feb 16, 2023

Uh oh!

Uh oh!

dcherian commented Mar 26, 2023

Uh oh!

Uh oh!

dcherian commented Aug 3, 2022 •

edited

Loading

dcherian Jan 24, 2023 •

edited

Loading