Skip to content

[PI] Improve load/create to default to "unknown" #3708

@abooton

Description

@abooton

Overview

As in #3585 the units should default to "unknown" when loading/creating:

Acceptance Criteria

  • All cubes and DimensionalMetadata whose units are not otherwise specified are loaded with units of "unknown". Specifically, defaults are set for:

    • Cubes
    • _DimensionalMetadata
      • Coord
      • DimCoord
      • AuxCoord
      • CellMeasure
      • AncillaryVariable
  • All cubes and DimensionalMetadata created in iris without specified units are created with units of "unknown". Specifically, defaults are set for:

    • Cubes (this is already the default)
    • _DimensionalMetadata
      • Coord
      • DimCoord
      • AuxCoord
      • CellMeasure
      • AncillaryVariable

Other useful "unit associated" items:

Correcting saving behaviour is addressed in #3394
Ancillary variables will be addressed in #3473
Improvements to flags will be addressed in #3474


Context

Units can be classified into different types (Known units vs unknown vs no-units. With dimensionless units as a subtype of known units). We interpret these types as follows:

Known dimensional units: e.g. mm
If the data's units are known and are recognized by cf, the units are loaded accordingly, used and saved to files. Known units are comprised of a prefix and unit, e.g mm. (See http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#units section 3.1 for supported prefixes)

Known dimensionless units: e.g. 1 (number of "parts")
Is applicable to data where dimensional analysis gives a "pure ratio". The value represents the number of "parts" so is typically "1", but there are other units which are considered dimensionless such as "degrees" or "percent". Dimensionless units can also be a n arbitrary value, for example "1e-6" would indicate data is parts per million (this is similar to the concept of a prefix as mentioned above). The value will be saved to file.

"no-unit": no-unit or no_unit
This implies that units are not appropriate for the data. e.g. if data is a string. Data with "no-unit" is disallowed from arithmetic, such operations are considered inappropriate.
This concept is not described by the CF conventions (it is borrowed from cf_units) and therefore a unit of "no-unit" will not be saved to file. Unitless variables are acceptable in CF conventions.

"unknown":
The data's units have not been defined, or they are invalid (and hence are not known). This could also conceivably describe data which ought to be described by "no-unit" but for which that fact could not be determined. Generally though, this is not considered to be the case. Data with "unknown" units are allowed to be used in arithmetic but will always yield data with "unknown" units.
This concept is not described by the CF conventions (it is borrowed from cf_units) and therefore a unit of "unknown" will not be saved to file.


Reasoning behind changes

We consider "unknown" to be the safest and most appropriate unit to give in iris when there is insufficient information to determine a unit. It allows for arithmetic while preventing the creation of incorrect units. Making this change has the additional bonus of allowing the round tripping of NetCDF files containing variables whose units have been intentionally left missing (as seems to be the case when dealing with quality flags).
For quality flags specifically, we have decided that "no-unit" is the more appropriate unit since the data should not be considered to be a numerical quantity (this is similar to how we treat string type data as having "no-unit") . However, for cases like these, "unknown" is still prefereable to the previous default of "1" and therefore "unknown" is a safer default in case there are any other such unanticipated cases.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions