Skip to content

Cache for area-weighted regridder #2472

@corinnebosley

Description

@corinnebosley

This has been a major issue for several people, here are a few links to provide some context:
#2370
https://exxconfigmgmt:6391/browse/EPM-1542


@abooton - Updating the description 03/12/2019.
As a user of the area-weighted regridder I would like to cache the area-weights (as well as snapshot the grid info) so that I can reduce the time taken to regrid multiple cubes.

Description:
As described in the iris documentation:
https://scitools.org.uk/iris/docs/latest/userguide/interpolation_and_regridding.html?highlight=regridding#caching-a-regridder
"If you need to regrid multiple cubes with a common source grid onto a common target grid you can ‘cache’ a regridder to be used for each of these regrids. This can shorten the execution time of your code as the most computationally intensive part of a regrid is setting up the regridder."

Unfortunately the weights are not currently cached, so the benefit described is not realised when carrying out area-weighted regridding.
It is noted that although, at present, the majority of time is currently spent calculating the weighted-mean, computing the weights can be significant e.g ~25% for non-masked arrays (not masked for which stats are reported on below).

See #2370 for a good example of setting the regridder up.

Acceptance Criteria:

  • Main API: regrid_area_weighted_rectilinear_src_and_grid API should be maintained as it is
  • Optimisation: Regridding one cube should not take longer than it does at present. (According to the ASV benchmarking)
  • The area-weights should only be computed once when using the regridder method iris.analysis.AreaWeighted().regridder(cube1, cube2) (see example in Area weighted regridder caching #2370)
  • If using masked data, but the mask is all false, treat as if it is non-masked during the meaning calculation.
  • The regridder should retain grid checking for compatibility with the src_cube
  • The current infrastructure should be followed (i.e. __prepare and __perform)
  • The _regrid_area_weighted code should remain in 'experimental/regrid.py' module (moving it into the main code is out of scope)
  • Weights should be calculated akin to now (i.e. reading from a file is out-of-scope)
  • Weights should be included as part of the regridder cache (i.e. reading from a file, or making the weights available for further use is out-of-scope)
  • If weights computation is refactored, ASV testing is required
  • If numba is used for code speed-up, it should be implemented on an "if available - use it" basis
  • Any other performance results should preferably reference the examples in this or the associated tickets (see above).

Note:
The weights are currently computed alongside the weighted mean calculation, in the loop. If the weights calculation is refactored, the grid points will be looped over twice instead of once. Therefore, it is suggested that the work is developed in a feature branch, and implemented once code refactoring and optimisation are both complete.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions