-
Notifications
You must be signed in to change notification settings - Fork 296
Description
I very much like using Iris, but the thing I think now most slows me down is having to deal with producing a new cube from existing cubes through the concatenate, merge and mathematical cube operations when the cubes have slightly different metadata, causing errors. This happens to me often enough to eat up a fair chunk of time (perhaps a couple of hours at a go) going through and correcting cube metadata in my code, checking this hasn't caused anything else to break etc. This is frustrating when I just want to do some analysis on a new dataset and I lose a couple of hours because cubes formed from other datasets I want to compare it with have quite a lot of minor differences in the metadata. I think it also makes it hard for beginners to start using Iris. So I was wondering could it be made possible to turn off checks of metadata that are very unlikely to indicate important inconsistencies? For example, could the concatenate, merge and maths operators have a keyword argument called "strict" that would default to "True", but which people like me who are happy to take the risk could set to "False" to skip checks that are likely to give a large number of "false positive" errors and slow down work. Conflicting fields could be given the value 'None' in the resulting cube and a warning could be raised to show what had happened.
I suggest that the cube var_name, attributes, cell_methods and metadata properties could be skipped when checking that cubes are consistent when using this option, and long_name when standard_name is present. Auxiliary coordinates present in one cube and not the other could simply be removed. Coordinate attributes, coord_system and var_name properties could also be skipped, and long_name when standard_name is present. I would also be happy for the bounds to be neglected, since if the points and units are the same then it seems unlikely that a real mistake is being made. Even with the more lenient checks, there would still be much more of a safeguard against making errors than with using numpy, so it seems unlikely that it would have terrible effects.
I'm interested to know others' thoughts on this.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status