Skip to content

error for apply_ufunc with exclude_dims and vectorize #3890

@mathause

Description

@mathause

I tried to use apply_ufunc for a function that takes input of unequal length and requires vectorize=True which resulted in a ValueError. I think the problem stems from the way np.vectorize is called.

MCVE Code Sample

import xarray as xr
import scipy as sp
import scipy.stats
import numpy as np

# create dataarrays of unequal length
ds = xr.tutorial.open_dataset("air_temperature")

da1 = ds.air
da2 = ds.air.isel(time=slice(None, 50))

# function that takes arguments of unequal length and requires vectorizing
def mannwhitneyu(x, y):
    _, p = sp.stats.mannwhitneyu(x, y)
    return p

# test that the function takes arguments of unequal length
mannwhitneyu(da1.isel(lat=0, lon=0), da2.isel(lat=0, lon=0))

xr.apply_ufunc(
    mannwhitneyu,
    da1,
    da2,
    input_core_dims=[["time"], ["time"]],
    exclude_dims=set(["time"]),
    vectorize=True,
)

Returns

ValueError: inconsistent size for core dimension 'n': 50 vs 2920

Note: the error stems from numpy.

Expected Output

A DataArray.

Problem Description

I can reproduce the problem in pure numpy:

vec_wrong = np.vectorize(mannwhitneyu, signature="(n),(n)->()", otypes=[np.float])
vec_wrong(da1.values.T, da2.values.T)

The correct result is returned when the signature is changed:

vec_correct = np.vectorize(mannwhitneyu, signature="(m),(n)->()", otypes=[np.float])
vec_correct(da1.values.T, da2.values.T)

So I think the signature needs to be changed when exclude_dims are present.

Versions

Output of `xr.show_versions()`

This is my development environment, so i think xarray should be 'master'.

**PNC:/home/mathause/conda/envs/xarray_devel/lib/python3.7/site-packages/PseudoNetCDF/pncwarn.py:24:UserWarning:
pyproj could not be found, so IO/API coordinates cannot be converted to lat/lon; to fix, install pyproj or basemap (e.g., pip install pyproj)

INSTALLED VERSIONS

commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-91-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.1

xarray: 0.11.1+335.gb0c336f6
pandas: 0.25.3
numpy: 1.17.3
scipy: 1.3.1
netCDF4: 1.5.3
pydap: installed
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: 2.3.2
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: installed
rasterio: 1.1.0
cfgrib: 0.9.5.4
iris: None
bottleneck: 1.2.1
dask: 2.6.0
distributed: 2.6.0
matplotlib: 3.1.2
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.6.0.post20191101
pip: 19.3.1
conda: installed
pytest: 5.2.2
IPython: 7.9.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions