-
Couldn't load subscription status.
- Fork 44
Description
I recently realized that we experience quite an efficiency loss due to our added coordinates (by the backend) concerning time.
The backend adds the auxiliary coordinates day_of_year, day_of_month, year, month_number. If you leave them in for further calculations based on collapsing along the time axis in diagnostics, this leads to a significant increase of processing time!
Here is a "simple" example that shows the increasing run time:
import iris
import iris.coord_categorisation
from scipy.stats import linregress
import functools
import time
def timer(func):
# from https://realpython.com/primer-on-python-decorators/#simple-decorators
@functools.wraps(func)
def wrapper_timer(*args, **kwargs):
start_time = time.perf_counter()
value = func(*args, **kwargs)
end_time = time.perf_counter()
run_time = end_time - start_time
print(f"Finished {func.__name__!r} in {run_time:.4f} secs")
return value
return wrapper_timer
filename = iris.sample_data_path('E1_north_america.nc')
air_temp = iris.load_cube(filename, 'air_temperature')
# removing extra coordinates
air_temp_small = air_temp.copy()
air_temp_small.remove_coord("forecast_reference_time")
air_temp_small.remove_coord("forecast_period")
# adding extra coordinates
air_temp_big = air_temp.copy()
iris.coord_categorisation.add_month_number(air_temp_big,"time","month")
iris.coord_categorisation.add_day_of_year(air_temp_big,"time","doy")
@timer
def Mean_Cube(cube, coords):
# replacement for a more complex function
return cube.collapsed(coords, iris.analysis.MEAN)
small_time = Mean_Cube(air_temp_small, "time")
# Finished 'Mean_Cube' in 0.0035 secs
print(small_time)
# ...
big_time = Mean_Cube(air_temp_big, "time")
# Finished 'Mean_Cube' in 2.2839 secs
print(big_time)
# ...
The processing time increases by a factor of ~1000!!!
print(big_time.data - small_time.data)
# = all zeroes!
The content itself is the same.
This does not occur for collapsing along dimensions without auxiliary coordinates.
small_lat = Mean_Cube(air_temp_small, "latitude")
# Finished 'Mean_Cube' in 0.0046 secs
print(small_lat)
# ...
big_lat = Mean_Cube(air_temp_big, "latitude")
# Finished 'Mean_Cube' in 0.0052 secs
print(big_lat)
# ...
This needs to be told to the diagnostic developers if they use iris in their diagnostics (unknown if this is relevant for other packages besides iris). I'm not sure if this is relevant for the backend, though.
I will address this issue also to the iris community, as I think this is a solvable performance issue. (as discussed with @bjoernbroetz)