Skip to content

Allow fast percentile method on masked arrays if mdtol=0 #4735

@rcomer

Description

@rcomer

✨ Feature Request

Enable the fast_percentile_method of the PERCENTILE aggregator to work on masked arrays if mdtol is set to zero.

Motivation

fast_percentile_method is, well, fast. So it would be good to be able to use it in a wider range of contexts. The particular use-case I'm thinking of is where all land points (and no other points) are masked and we aggregate over all the dimensions except the spatial ones. So for the grid cells we are interested in, we have a full (non-masked) set.

Additional context

The fast_percentile_method of the PERCENTILE aggregator uses numpy.percentile, which just ignores any mask, so in general we'd get the wrong values out:

import numpy as np
import numpy.ma as ma

arr = ma.array(np.tile(np.arange(6), 2).reshape(2, 6))
arr[0, 4:] = ma.masked

print(arr)
print(np.percentile(arr, 50, axis=1))
[[0 1 2 3 -- --]
 [0 1 2 3 4 5]]
/net/project/ukmo/scitools/opt_scitools/conda/deployments/default-2022_04_05/lib/python3.8/site-packages/numpy/lib/function_base.py:4650: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray.
  arr.partition(
[2.5 2.5]

So we currently just don't allow it to run on masked arrays:

if fast_percentile_method:
msg = "Cannot use fast np.percentile method with masked array."
if ma.is_masked(data):
raise TypeError(msg)

However, if mdtol was set to zero, we would always add a mask over those wrong values in the result:

if (
mdtol is not None
and ma.is_masked(data)
and result is not ma.masked
):
fraction_not_missing = data.count(axis=axis) / data.shape[axis]
mask_update = 1 - mdtol > fraction_not_missing
if ma.isMaskedArray(result):
result.mask = result.mask | mask_update
else:
result = ma.array(result, mask=mask_update)

or

def _build_dask_mdtol_function(dask_stats_function):

So I think it would be safe to relax the check so that this method can still run if mdtol is zero.

Metadata

Metadata

Assignees

Labels

Peloton 🚴‍♂️Target a breakaway issue to be caught and closed by the peloton

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions