-
Notifications
You must be signed in to change notification settings - Fork 297
Description
✨ Feature Request
Enable the fast_percentile_method
of the PERCENTILE
aggregator to work on masked arrays if mdtol
is set to zero.
Motivation
fast_percentile_method
is, well, fast. So it would be good to be able to use it in a wider range of contexts. The particular use-case I'm thinking of is where all land points (and no other points) are masked and we aggregate over all the dimensions except the spatial ones. So for the grid cells we are interested in, we have a full (non-masked) set.
Additional context
The fast_percentile_method
of the PERCENTILE
aggregator uses numpy.percentile
, which just ignores any mask, so in general we'd get the wrong values out:
import numpy as np
import numpy.ma as ma
arr = ma.array(np.tile(np.arange(6), 2).reshape(2, 6))
arr[0, 4:] = ma.masked
print(arr)
print(np.percentile(arr, 50, axis=1))
[[0 1 2 3 -- --]
[0 1 2 3 4 5]]
/net/project/ukmo/scitools/opt_scitools/conda/deployments/default-2022_04_05/lib/python3.8/site-packages/numpy/lib/function_base.py:4650: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray.
arr.partition(
[2.5 2.5]
So we currently just don't allow it to run on masked arrays:
iris/lib/iris/analysis/__init__.py
Lines 1283 to 1286 in 1be3a54
if fast_percentile_method: | |
msg = "Cannot use fast np.percentile method with masked array." | |
if ma.is_masked(data): | |
raise TypeError(msg) |
However, if mdtol
was set to zero, we would always add a mask over those wrong values in the result:
iris/lib/iris/analysis/__init__.py
Lines 588 to 598 in 1be3a54
if ( | |
mdtol is not None | |
and ma.is_masked(data) | |
and result is not ma.masked | |
): | |
fraction_not_missing = data.count(axis=axis) / data.shape[axis] | |
mask_update = 1 - mdtol > fraction_not_missing | |
if ma.isMaskedArray(result): | |
result.mask = result.mask | mask_update | |
else: | |
result = ma.array(result, mask=mask_update) |
or
iris/lib/iris/analysis/__init__.py
Line 1197 in 1be3a54
def _build_dask_mdtol_function(dask_stats_function): |
So I think it would be safe to relax the check so that this method can still run if mdtol
is zero.