Raise warning when cube masked array re-cast to numpy array #1575

DPeterK · 2015-03-13T16:51:56Z

Added a warning when subsetting a cube will cast a masked array to a numpy array.

This will happen when a cube's data attribute is a filled (i.e. no unmasked points) masked array. This currently happens silently, which is undesirable if not expected.

… numpy array

pelson · 2015-03-16T09:07:54Z

This currently happens silently, which is undesirable if not expected.

Out of interest, why is this undesirable? If it is so undesirable, I'd back not having a warning and just not doing it. I'm not too fussed which way it goes, but I'm not a fan of the middle ground (aka warnings).

ajdawson · 2015-03-16T09:10:56Z

I don't like changing the type of the array when slicing, I think we just shouldn't do it. Does anyone know why we did it like this in the first place? Was it for efficiency?

rhattersley · 2015-03-16T09:28:46Z

Was it for efficiency?

That's what I remember. Switching to masked-arrays made everything slower, so the workaround was to avoid masked arrays where possible. A lot has changed since then though, including Iris optimisation of masked array creation, so it's quite possible the workaround is now unnecessary.

DPeterK · 2015-03-17T09:14:44Z

In the interests of experimentation I tried running the tests with the if block that fills the mask (i.e. cube L1936-L1939) removed. This caused no test failures other than the test I added in this PR.

rhattersley · 2015-03-17T09:56:58Z

In the interests of experimentation I tried running the tests with the if block that fills the mask (i.e. cube L1936-L1939) removed.

It would be interesting to check the performance impact. For example, the execution time/memory load of running the tests, the examples, specific performance metrics, etc.

ajdawson · 2016-06-08T11:01:07Z

@dkillick - In light of #2046 + several threads on the Google Group referring to the loss of mask when slicing/extracting, is there a chance you could revive this work? If we can determine any performance impact of removing the fill of masked arrays we could make a decision and sort this out.

DPeterK · 2016-06-08T11:04:59Z

@ajdawson we've a bit of internal pressure at the moment, so I won't be able to jump on it immediately. I'll add it to the v2.0 milestone though so that this isn't forgotten when time does become available.

ajdawson · 2016-06-08T11:07:56Z

Great, thanks @dkillick

DPeterK · 2017-08-01T12:04:30Z

It appears that this still a live issue, and we still need to reach a decision on whether to return a masked array or a filled array (i.e. ndarray).

@lbdreyer and I just tested for this behaviour on the dask masked array branch. We found that:

realised masked data that is subsetted will be filled with the result being an ndarray (that is, the existing behaviour), but
dask lazy masked data that is subsetted and then realised will not be filled, with the result remaining a masked array with no masked points.

The difference in behaviour between lazy and real data is highly undesirable and drives the point that a consistent solution to this issue is still required.

ajdawson · 2017-08-01T12:16:33Z

The obvious way to reach consistency is to leave masked arrays alone and never fill them. I think we came to the conclusion that we did the filling for performance reasons, but we don't really know exactly what performance hit we might expect... Are there situations where we need to do this on slice, as opposed to allowing the user to convert to a normal array if desired?

pelson · 2017-10-19T11:18:46Z

@djkirkham - could you take a look at where we stand on this now that dask-mask has been merged into master. Still a live issue? As far as I understand, we are now at the mercy of dask/numpy.ma, right?

djkirkham · 2017-10-19T12:58:42Z

Hmm.. it looks like we're currently in a "worst of both worlds" situation. Calling .data on a cube constructed from a masked array, whether lazy or not, will return a masked array even if there are no masked points. But slicing that cube and calling .data returns an ndarray if there are no masked points. There are a few other places where a no-mask masked array is converted to an ndarray - most notably CubeList.merge().

djkirkham · 2017-10-20T10:58:18Z

According to @pp-mo the above has always been the case unless the data was lazy, so it's not such a big issue. Still, we need a resolution. The easiest thing to do would be to just do what we did before: unmask when slicing and when realising lazy data. In the absence of any performance tests indicating that there's little or no difference performing operations on masked arrays I think it's safest to try to ensure we're operating on unmasked data.

DPeterK · 2017-10-23T16:00:22Z

@djkirkham I still think that Iris should not be changing the type of an object...

djkirkham · 2017-10-23T16:38:30Z

@dkillick I don't see it as such a big issue; users shouldn't be relying on the type anyway (they can't currently anyway). But I still don't like having to do the check every time, especially since we're not consistent about applying it throughout the code base. If there really is a performance hit from using a masked array maybe it would be better to unmask it in computationally heavy parts of the code. But that seems like a lot of extra work for little gain.

djkirkham · 2017-10-26T14:24:01Z

Replaced by #2856

Added warning when subsetting cube will cast a full masked array to a…

5f4eea7

… numpy array

ajdawson mentioned this pull request Jun 8, 2016

Loss of mask when slicing/extracting #2046

Closed

DPeterK added this to the v2.0 milestone Jun 8, 2016

pelson assigned djkirkham Oct 19, 2017

djkirkham mentioned this pull request Oct 24, 2017

Don't remove mask on no-mask masked arrays #2856

Merged

djkirkham closed this Oct 26, 2017

Uh oh!

Raise warning when cube masked array re-cast to numpy array #1575

Raise warning when cube masked array re-cast to numpy array #1575

Uh oh!

Conversation

DPeterK commented Mar 13, 2015

Uh oh!

pelson commented Mar 16, 2015

Uh oh!

ajdawson commented Mar 16, 2015

Uh oh!

rhattersley commented Mar 16, 2015

Uh oh!

DPeterK commented Mar 17, 2015

Uh oh!

rhattersley commented Mar 17, 2015

Uh oh!

ajdawson commented Jun 8, 2016

Uh oh!

DPeterK commented Jun 8, 2016

Uh oh!

ajdawson commented Jun 8, 2016

Uh oh!

DPeterK commented Aug 1, 2017

Uh oh!

ajdawson commented Aug 1, 2017

Uh oh!

pelson commented Oct 19, 2017

Uh oh!

djkirkham commented Oct 19, 2017

Uh oh!

djkirkham commented Oct 20, 2017

Uh oh!

DPeterK commented Oct 23, 2017

Uh oh!

djkirkham commented Oct 23, 2017

Uh oh!

djkirkham commented Oct 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants