-
Couldn't load subscription status.
- Fork 297
Raise warning when cube masked array re-cast to numpy array #1575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Out of interest, why is this undesirable? If it is so undesirable, I'd back not having a warning and just not doing it. I'm not too fussed which way it goes, but I'm not a fan of the middle ground (aka warnings). |
|
I don't like changing the type of the array when slicing, I think we just shouldn't do it. Does anyone know why we did it like this in the first place? Was it for efficiency? |
That's what I remember. Switching to masked-arrays made everything slower, so the workaround was to avoid masked arrays where possible. A lot has changed since then though, including Iris optimisation of masked array creation, so it's quite possible the workaround is now unnecessary. |
|
In the interests of experimentation I tried running the tests with the if block that fills the mask (i.e. cube L1936-L1939) removed. This caused no test failures other than the test I added in this PR. |
It would be interesting to check the performance impact. For example, the execution time/memory load of running the tests, the examples, specific performance metrics, etc. |
|
@dkillick - In light of #2046 + several threads on the Google Group referring to the loss of mask when slicing/extracting, is there a chance you could revive this work? If we can determine any performance impact of removing the fill of masked arrays we could make a decision and sort this out. |
|
@ajdawson we've a bit of internal pressure at the moment, so I won't be able to jump on it immediately. I'll add it to the v2.0 milestone though so that this isn't forgotten when time does become available. |
|
Great, thanks @dkillick |
|
It appears that this still a live issue, and we still need to reach a decision on whether to return a masked array or a filled array (i.e. ndarray). @lbdreyer and I just tested for this behaviour on the dask masked array branch. We found that:
The difference in behaviour between lazy and real data is highly undesirable and drives the point that a consistent solution to this issue is still required. |
|
The obvious way to reach consistency is to leave masked arrays alone and never fill them. I think we came to the conclusion that we did the filling for performance reasons, but we don't really know exactly what performance hit we might expect... Are there situations where we need to do this on slice, as opposed to allowing the user to convert to a normal array if desired? |
|
@djkirkham - could you take a look at where we stand on this now that dask-mask has been merged into master. Still a live issue? As far as I understand, we are now at the mercy of dask/numpy.ma, right? |
|
Hmm.. it looks like we're currently in a "worst of both worlds" situation. Calling |
|
According to @pp-mo the above has always been the case unless the data was lazy, so it's not such a big issue. Still, we need a resolution. The easiest thing to do would be to just do what we did before: unmask when slicing and when realising lazy data. In the absence of any performance tests indicating that there's little or no difference performing operations on masked arrays I think it's safest to try to ensure we're operating on unmasked data. |
|
@djkirkham I still think that Iris should not be changing the type of an object... |
|
@dkillick I don't see it as such a big issue; users shouldn't be relying on the type anyway (they can't currently anyway). But I still don't like having to do the check every time, especially since we're not consistent about applying it throughout the code base. If there really is a performance hit from using a masked array maybe it would be better to unmask it in computationally heavy parts of the code. But that seems like a lot of extra work for little gain. |
|
Replaced by #2856 |
Added a warning when subsetting a cube will cast a masked array to a numpy array.
This will happen when a cube's data attribute is a filled (i.e. no unmasked points) masked array. This currently happens silently, which is undesirable if not expected.