-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
After upgrading to Xarray 2024.11.0, a colleague from work reported to me that the forward fill method was not properly working when the limit parameter was used, causing some values to get NaN instead of the previous value.
I investigated what was causing the issue, and I found that the problem is in the method used as binop on the cumreduction for calculating the valid positions to push, it is not correctly restarting the counter because the cumreduction calls twice this function, one for adding the last value of the previous chunk to the actual one, and the second time for combining the last value of two chunks, this last call it's not working as expected because it is not able to detect if there was a restart or not of the accumulation, making that the number of nans accumulates indefinitely.
I have not found a way to correctly detect if there was a restart on the counter or not when the cumreduction calls the binop again, mainly because the function does not receive any parameter indicating the shape of the chunk, so it is hard to determine if the counter suffered a restart or not just seeing it's last accumulated value, for that reason, I propose to replace the direct use of the cumreduction (only for the valid positions) with some Dask function as shown on the attached images, unfortunately, this generates more tasks than before.
What did you expect to happen?
I expect that the results of the push method will match the in-memory version as shown in the minimal complete verifiable example.
Minimal Complete Verifiable Example
from xarray.core.dask_array_ops import push
from bottleneck import push as push_bottleneck
v = np.array([np.nan, np.nan, np.nan, 2, np.nan, np.nan, np.nan, 9, np.nan, np.nan, np.nan])
arr = da.from_array(v, chunks=3)
r1 = push(arr, 4, axis=0, method="sequential")
r2 = push_bottleneck(v, 4, axis=0)
print(r1.compute())
print(r2)
assert np.array_equal(r1.compute(), r2, equal_nan=True)MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Output of the minimal complete verifiable example:
dask [nan nan nan 2. 2. 2. nan 9. 9. nan nan]
in-memory [nan nan nan 2. 2. 2. 2. 9. 9. 9. 9.]
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[34], line 13
11 print(r1.compute())
12 print(r2)
---> 13 assert np.array_equal(r1.compute(), r2, equal_nan=True)
AssertionError:Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: 5279bd1
python: 3.11.10 | packaged by conda-forge | (main, Sep 10 2024, 10:53:25) [MSC v.1940 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('Spanish_Venezuela', '1252')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2025.1.2.dev1+g5279bd15.d20250111
pandas: 2.2.2
numpy: 2.0.2
scipy: 1.14.1
netCDF4: 1.7.1.post2
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2024.9.0
distributed: 2024.9.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: 0.8.2
fsspec: 2024.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 73.0.1
pip: 24.2
conda: None
pytest: 8.3.3
mypy: 1.11.2
IPython: 8.27.0
sphinx: None

