-
Notifications
You must be signed in to change notification settings - Fork 297
Calculation of derived coords points and bounds is always lazy. #2604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculation of derived coords points and bounds is always lazy. #2604
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very recently made these changes to exactly the same code section. Tests added at that time should cover this PR as well.
Provided those recently-added tests pass, I am happy with this change.
Yes, I also removed the line |
|
Errors have come in because some factory calculations are not viable on dask arrays.
|
718fad1 to
b84ff92
Compare
|
I think this is finally sorted.
|
|
STATUS: Currently waiting a fix for testing problems since we provided numpy 1.13. I will also get on + post some info on the performance improvement that motivated this ... |
| shape[i] = size | ||
| return shape | ||
|
|
||
| def _dtype(self, arrays_by_key, **other_args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo 😱 Oh my gosh ... this isn't used anywhere! Who knows when it was relevant and useful ... good spot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both _shape and _dtype were needed when making LazyArray-s, so you could tell it what the shape and type the result would be. Biggus or dask does all that for you, of course.
| nd_values_by_key[key] = nd_values | ||
| return nd_values_by_key | ||
|
|
||
| def _shape(self, nd_values_by_key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo This was only used in OceanSigmaZFactory.make_coord(), so you must have refactored that away ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively the same calc now exists inside OceanSigmaZFactory._derive(), but now it is done by making an array of all the dependency shapes + taking max size over each dim.
lib/iris/aux_factory.py
Outdated
| transpose_order = [pair[0] for pair in sorted_pairs] + [len(dims)] | ||
| bounds = coord.core_bounds() | ||
| bounds = coord.lazy_bounds() | ||
| if dims: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
lib/iris/aux_factory.py
Outdated
| orography = orography_pts.reshape( | ||
| orography_pts_shape.append(1)) | ||
| bds_shape = list(orography_pts.shape) + [1] | ||
| orography = orography_pts.reshape(bds_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I never liked all these called-only-once functions at all.
So given your prompt I've just removed them all -- hope that suits!
lib/iris/aux_factory.py
Outdated
| nsigma_slice[index] = slice(0, int(nd_points_by_key['nsigma'])) | ||
| nsigma_slice = tuple(nsigma_slice) | ||
|
|
||
| nsigma, = nd_points_by_key['nsigma'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo Subtle unpacking using nsigma, ... but [nsigma] = nd_points_by_key['nsigma'] is an alternative, less subtle pattern ... you're choice, it's a minor point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, I do prefer.
| nd_points_by_key['zlev'], | ||
| points_shape, | ||
| nsigma_slice) | ||
| nsigma, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo You could just pass in nd_points_by_key['nsigma'] and do away with the local nsigma ... why did you unpack it? Just for convenience? I can only see it being used on line 866 below ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I wanted it to be seen as an "extra" argument, i.e. a normal Python value, and not another dependency array.
lib/iris/aux_factory.py
Outdated
| zlev, nsigma, coord_dims_func): | ||
| # Calculate the index of the 'z' dimension in the inputs. | ||
| # Get the cube dimension... | ||
| i_levels_cubedim, = coord_dims_func(self.dependencies['zlev']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo Again [i_levels_cubedim] rather than i_levels_cubedim, ... your choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
|
Here's the performance demo. First I generated some hybrid-height data regridded from Then run : Results: CONTEXT NOTE: So in this case, the delay normally only happens after it has fetched the derived coords data. When all dependencies are realised, you get a delay every time you fetch cube.coord('altitude'), e.g. if you print the cube. |
lib/iris/aux_factory.py
Outdated
| derived_cubedims = self.derived_dims(coord_dims_func) | ||
| i_levels_dim = i_levels_cubedim - sum( | ||
| i_dim not in derived_cubedims | ||
| for i_dim in range(i_levels_cubedim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo Would you buy into renaming i_levels_cubedim to zlev_dim, and i_levels_dim to zlev_index or zlev_offset.
There's a lot going on here, and using i_levels_... feels (to me) like its one level if indirection that could be avoided ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That naming was an attempt to distinguish between the dependency arguments, which are all arrays of the same dimensionality, and the "extra" args which as ordinary Python values.
Hence the "i_" -- it means 'integer', not indirection.
The problem is, we have two contexts for 'dimension' or 'index' : The original cube, and the dependency arguments. That's why I need to calculate 'i_levels_dim' from 'i_levels_cubedim'.
I'll think on ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't .index your friend here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't .index your friend here?
Yes, thanks.
I've now returned to that, the way it was done in the original code.
| [el.shape | ||
| for el in (sigma, eta, depth, depth_c, zlev) | ||
| if el.ndim]) | ||
| result_shape = list(np.max(allshapes, axis=0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo This only works if all the elements have the same number of shape dimensions ... which must be the case, right? ... given the output from _remap and _remap_with_bounds, which aligns the dimensionality of everything or injects 0-d scalars for missing coordinates.
Just looking for re-assurance (and convince me otherwise) but should we ensure that allshapes has equal length elements before doing the np.max ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe it is guaranteed by _nd_points and _nd_bounds that the results have all-same dimensions, given they are called with the same 'ndim'.
I will try to amend some docstrings (_nd_xxx and _remap_xxx) to make this clearer : it's obvious that _remap_xxx should have docstrings reallly ...
lib/iris/aux_factory.py
Outdated
|
|
||
| nsigma_levs = eta + sigma * (da.minimum(depth_c, depth) + eta) | ||
| # Expand to full shape, as it may sometimes have lower dimensionality. | ||
| ones_full_result = np.ones(result_shape, dtype=np.int16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo Can't we use nsigma_levs.dtype here instead of np.int16 ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shouldn't this be da.ones(...) ? To keep all things lazy ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be da.ones(...) ? To keep all things lazy
I thought I'd only use dask where needed, and I thought it wasn't here because "nsigma_levs" is always lazy, so result is "dask * numpy". However, you just reminded me that this could be creating a large real array, so I'll change !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use nsigma_levs.dtype here instead of np.int16
Yes, we can, probably better...
lib/iris/aux_factory.py
Outdated
| # Expand to full shape, as it may sometimes have lower dimensionality. | ||
| ones_full_result = np.ones(result_shape, dtype=np.int16) | ||
| ones_nsigma_result = ones_full_result[z_slices_nsigma] | ||
| result_nsigma_levs = nsigma_levs * ones_nsigma_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pp-mo This is getting a tad abstract ... but why is there a need to do nsigma_levs * ones_nsigma_result ?
Is it not sufficient just to do:
nsigma_levs = eta + sigma * (da.minimum(depth_c, depth) + eta)
zlev = zlev * da.ones(result_shape, dtype=nsigma_levs.dtype)
result = da.concatenate([nsigma_levs, zlev[z_slices_rest]], axis=i_levels_dim)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not sufficient just to do
No, because of the way it has to work with possibly-missing dependencies.
From the CF equation:
k <= nsigma:: z(n,k,j,i) = eta(n,j,i) + sigma(k)*(min(depth_c,depth(j,i))+eta(n,j,i))
k > nsigma:: z(n,k,j,i) = zlev(k)
From _check_dependencies, we always have zlev but maybe only one of sigma or eta.
Thus we always have a 'k' (vertical) dimension in the dependency dims (but not always any 'n' (time) dimension).
If sigma is missing, then the "main" nsigma_levs calculation yields just 'eta(n, 1, j, i)'
-- the 'k' dimension is a 1 because it isn't present in the original eta array.
For the concatenation, this must get "replicated" up to (n, nsigma, j, i).
If not, in numpy you get a different (wrong) result shape
-- and in dask that causes an actual error.
The original code had the assignment result[nsigma_slice] = , which does the right thing "automatically", by broadcasting.
|
@pp-mo Okay, I'm done for now ... over to you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
01bd13d to
8dab554
Compare
507a70f to
3a4a536
Compare
That was a bit nasty. ... because |
|
@pp-mo - Given the spin-up cost, I honestly can't justify the effort of me merging this today. I completely appreciate your desire for it not to go stale, and am hopeful that @bjlittle will be able to merge tomorrow when he returns. If not, I'll clear a few hours and try to get the ball rolling on it myself in the next few days. Work for you? |
|
@pp-mo I can review this. Just give me a bit of time to check what you have done since I was last in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from wanting to know what the slices tuple whatever does, I am happy with this.
| # Make a slice tuple to index the remaining z-levels. | ||
| z_slices_rest = [slice(None)] * ndims | ||
| z_slices_rest[z_dim] = slice(int(nsigma), None) | ||
| z_slices_rest = tuple(z_slices_rest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain to me what this bit (L.814 to L.820) does? I don't understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The z_slices_nsigma thing is a tuple of keys to extract the first 'nsigma' z-levels,
i.e. data[z_slices_nsigma] would be something line data[:, ..., :, :nsigma, :, ..., :]
- recall that here, 'nsigma' is just a number.
This is exactly what in the original code is called nsigma_slice.
I changed over to calculating that within the _derive call, instead of passing in in from the make_coord routine, as I thought it was much clearer to have derivation + use of it in the same place.
Meanwhile, the subsequent bit z_slices_rest is to select the "remaining" z levels, i.e. those beyond the first nsigma :
that is, if data[z_slices_nsigma] does something like data[:, ..., :, 0:nsigma, :, ..., :],
then data[z_slices_rest] is like data[:, ..., :, nsigma:-1, :, ..., :],
In the new calc, we need to extract over the 'remaining' levels to get the concatenate right.
Is that any clearer ?
|
@pp-mo What do you want me to do about this? I am happy to merge, but I can leave it if you would like someone more qualified to check it over first. |
|
Hi @corinnebosley sorry for slow response -- I've been at a meeting most of this morning.
I think @bjlittle provisionally approved this anyway according #2604 (review) That should cover everything except the latest commit, which fixed the warning/error testing bug. |
|
@pp-mo I thought that might have been the case, but I checked over everything anyway and couldn't spot any issues. Bearing that in mind, and given that the tests are passing, I will merge this in one hour (at 15:18) unless I get any objections in the meantime. |
|
@pp-mo Boom! |
|
@corinnebosley thanks ! |
…ools#2604) * Integration tests for OceanSigmaZFactory -- lazy cases not currently working. * Refactor osz (not yet lazy). * Add test with extra cube dims. * Fixed calculation; all working except lazy integration tests. * Enable all-lazy operation; all tests working. * Fix nasty misuses of list.append. * Adjust testing for fixes to aux_factory code. * CML changes: missing altitude points NaN --> mask. * Clarify need for some integration testcases. * Review changes. * Reroute assertRaisesRegexp to prevent Python3 deprecation undermining warnings tests.
Addresses #2586