Skip to content

Added auxiliary coordinates lead to (time) efficiency loss #195

@BenMGeo

Description

@BenMGeo

I recently realized that we experience quite an efficiency loss due to our added coordinates (by the backend) concerning time.

The backend adds the auxiliary coordinates day_of_year, day_of_month, year, month_number. If you leave them in for further calculations based on collapsing along the time axis in diagnostics, this leads to a significant increase of processing time!

Here is a "simple" example that shows the increasing run time:

import iris
import iris.coord_categorisation
from scipy.stats import linregress

import functools
import time

def timer(func):
    # from https://realpython.com/primer-on-python-decorators/#simple-decorators
    @functools.wraps(func)
    def wrapper_timer(*args, **kwargs):
        start_time = time.perf_counter()    
        value = func(*args, **kwargs)
        end_time = time.perf_counter()      
        run_time = end_time - start_time    
        print(f"Finished {func.__name__!r} in {run_time:.4f} secs")
        return value
    return wrapper_timer


filename = iris.sample_data_path('E1_north_america.nc')
air_temp = iris.load_cube(filename, 'air_temperature')

# removing extra coordinates
air_temp_small = air_temp.copy()

air_temp_small.remove_coord("forecast_reference_time")
air_temp_small.remove_coord("forecast_period")

# adding extra coordinates
air_temp_big = air_temp.copy()

iris.coord_categorisation.add_month_number(air_temp_big,"time","month")
iris.coord_categorisation.add_day_of_year(air_temp_big,"time","doy")

@timer
def Mean_Cube(cube, coords):
    # replacement for a more complex function
    return cube.collapsed(coords, iris.analysis.MEAN)
small_time = Mean_Cube(air_temp_small, "time")
# Finished 'Mean_Cube' in 0.0035 secs
print(small_time)
# ...
big_time = Mean_Cube(air_temp_big, "time")
# Finished 'Mean_Cube' in 2.2839 secs
print(big_time)
# ...

The processing time increases by a factor of ~1000!!!

print(big_time.data - small_time.data)
# = all zeroes!

The content itself is the same.

This does not occur for collapsing along dimensions without auxiliary coordinates.

small_lat = Mean_Cube(air_temp_small, "latitude")
# Finished 'Mean_Cube' in 0.0046 secs
print(small_lat)
# ...

big_lat = Mean_Cube(air_temp_big, "latitude")
# Finished 'Mean_Cube' in 0.0052 secs
print(big_lat)
# ...

This needs to be told to the diagnostic developers if they use iris in their diagnostics (unknown if this is relevant for other packages besides iris). I'm not sure if this is relevant for the backend, though.

I will address this issue also to the iris community, as I think this is a solvable performance issue. (as discussed with @bjoernbroetz)

Metadata

Metadata

Assignees

Labels

preprocessorRelated to the preprocessor

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions