Skip to content

Time inefficient collapsing of cubes with auxiliary coordinates #3363

@BenMGeo

Description

@BenMGeo

I recently realized that collapsing cubes with additional auxiliary coordinates is quite inefficient concidering the time it takes. (ESMValGroup/ESMValCore#195)

Here is a "simple" example that shows the increasing run time:

import iris
import iris.coord_categorisation
from scipy.stats import linregress

import functools
import time

def timer(func):
    # from https://realpython.com/primer-on-python-decorators/#simple-decorators
    @functools.wraps(func)
    def wrapper_timer(*args, **kwargs):
        start_time = time.perf_counter()    
        value = func(*args, **kwargs)
        end_time = time.perf_counter()      
        run_time = end_time - start_time    
        print(f"Finished {func.__name__!r} in {run_time:.4f} secs")
        return value
    return wrapper_timer


filename = iris.sample_data_path('E1_north_america.nc')
air_temp = iris.load_cube(filename, 'air_temperature')

# removing extra coordinates
air_temp_small = air_temp.copy()

air_temp_small.remove_coord("forecast_reference_time")
air_temp_small.remove_coord("forecast_period")

# adding extra coordinates
air_temp_big = air_temp.copy()

iris.coord_categorisation.add_month_number(air_temp_big,"time","month")
iris.coord_categorisation.add_day_of_year(air_temp_big,"time","doy")

@timer
def Mean_Cube(cube, coords):
    # replacement for a more complex function
    return cube.collapsed(coords, iris.analysis.MEAN)
small_time = Mean_Cube(air_temp_small, "time")
# Finished 'Mean_Cube' in 0.0035 secs
print(small_time)
# ...
big_time = Mean_Cube(air_temp_big, "time")
# Finished 'Mean_Cube' in 2.2839 secs
print(big_time)
# ...

The processing time increases by a factor of ~1000!!!

print(big_time.data - small_time.data)
# = all zeroes!

The content itself is the same.

This does not occur for collapsing along dimensions without auxiliary coordinates.

small_lat = Mean_Cube(air_temp_small, "latitude")
# Finished 'Mean_Cube' in 0.0046 secs
print(small_lat)
# ...

big_lat = Mean_Cube(air_temp_big, "latitude")
# Finished 'Mean_Cube' in 0.0052 secs
print(big_lat)
# ...

Is this an expected (and wanted) behaviour? I have the feeling this increase is substantially larger than what is needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions