Skip to content

Loading not possible with non-standard netcdf variable names #5171

@pp-mo

Description

@pp-mo

A user presented some data from an online repository which has a lot of rather weird variable names (but is otherwise fairly sensible).
Probably from the PALM-4U atmospheric model

The question being : should we add some compliance to allow this?

Iris has been very strict on this since v2.3 -- see #3399, and refuses to load the file,
raising ValueError: 'theta(0)' is not a valid NetCDF variable name..
I think the main problem for Iris, which motivated being stricter in that change, is that we really wouldn't want to save data with these kinds of variable names
-- and maybe they could also cause other internal problems in Iris, like selecting cubes/coords by name ?

Resolution:

I personally think that is a poor reason for erroring it on load -- a tolerant adjusment would be "more ideal" IMHO.
A user comment was :

"I agree with the general principle that we shouldn’t be creating files containing such characters in variable names ... we are not always trying to load file which we have had control over the generation of ... Hence having no way of reading the file becomes quite a big restriction."

It seems a bit odd not to be able to load this, claiming that it is not "good netcdf", whereas there is nothing really wrong with the file : So, there's a difference between what netcdf say is generally valid, and what you can have in an actual HDF5-based netCDF4 file.
Needless to say, xarray has no problem with this data !
Adding the issue to the CF compliance discussion

Some Details...

Example file dump (shortened):

netcdf scalars_100_PALM_LES_IMUK_v2 {
dimensions:
	time = UNLIMITED ; // (1140 currently)
variables:
	double time(time) ;
		time:units = "seconds" ;
		time:long_name = "time" ;
		time:standard_name = "time" ;
		time:axis = "T" ;
	float E(time) ;
		E:units = "m2/s2" ;
		E:long_name = "E" ;
	float E\*(time) ;
		E\*:units = "m2/s2" ;
		E\*:long_name = "E*" ;
	float dt(time) ;
		dt:units = "s" ;
		dt:long_name = "dt" ;
	float us\*(time) ;
		us\*:units = "m/s" ;
		us\*:long_name = "us*" ;
	float th\*(time) ;
		th\*:units = "K" ;
		th\*:long_name = "th*" ;
  . . .

// global attributes:
		:title = "PALM 6.0  Rev: 4531  run: lanfex_iop1_stage1_low_15m.00  host: atosb_rrtmg  2021-03-30 00:08:02" ;
		:Conventions = "CF-1.7" ;
 . . .
}

List of variable names, :

>>> ds = nc.Dataset('sample_data/scalars_100_PALM_LES_IMUK_v2.nc')
>>> print(ds.variables.keys())
dict_keys(['time', 'E', 'E*', 'dt', 'us*', 'th*', 'umax', 'vmax', 'wmax', 'div_new', 'div_old', 'zi_wtheta', 'zi_theta', 'w*', 'w"theta"0', 'w"theta"', 'wtheta', 'theta(0)', 'theta(z_mo)', 'w"u"0', 'w"v"0', 'w"q"0', 'ol', 'q*', 'w"s"', 's*', 'ghf', 'qsws_liq', 'qsws_soil', 'qsws_veg', 'r_a', 'r_s', 'rad_net', 'rad_lw_in', 'rad_lw_out', 'rad_sw_in', 'rad_sw_out', 'rrtm_aldif', 'rrtm_aldir', 'rrtm_asdif', 'rrtm_asdir', 'time_s', 'lwdn', 'lwup', 'swdn', 'swup', 'tstar', 'shf', 'lhf', 'ustar', 'blh', 'zct', 'lwp', 'tca', 'cldsed', 'rainsed', 'tscrn', 'rhscrn', 'vis', 'u10m', 'v10m', 'zcb', 'blh2', 'smax', 'smax2m', 'lwp_min', 'lwp_max'])

and all the "invalid" ones

>>> from iris.common.metadata import _TOKEN_PARSE as tp 
>>> print([k for k in ds.variables.keys() if not tp.match(k)])
['E*', 'us*', 'th*', 'w*', 'w"theta"0', 'w"theta"', 'theta(0)', 'theta(z_mo)', 'w"u"0', 'w"v"0', 'w"q"0', 'q*', 'w"s"', 's*']
>>> 

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions