-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Code Sample, a copy-pastable example if possible
import numpy as np
import xarray as xr
# create time and time_bounds DataArrays for Jan-1850 and Feb-1850
time_bounds_vals = np.array([[0.0, 31.0], [31.0, 59.0]])
time_vals = time_bounds_vals.mean(axis=1)
time_var = xr.DataArray(time_vals, dims='time',
coords={'time':time_vals})
time_bounds_var = xr.DataArray(time_bounds_vals, dims=('time', 'd2'),
coords={'time':time_vals})
# create Dataset of time and time_bounds
ds = xr.Dataset(coords={'time':time_var}, data_vars={'time_bounds':time_bounds_var})
ds.time.attrs = {'bounds':'time_bounds', 'calendar':'noleap',
'units':'days since 1850-01-01'}
# write Jan-1850 values to file
ds.isel(time=slice(0,1)).to_netcdf('Jan-1850.nc', unlimited_dims='time')
# write Feb-1850 values to file
ds.isel(time=slice(1,2)).to_netcdf('Feb-1850.nc', unlimited_dims='time')
# use open_mfdataset to read in files, combining into 1 Dataset
decode_times = True
decode_cf = True
ds = xr.open_mfdataset(['Jan-1850.nc', 'Feb-1850.nc'],
decode_cf=decode_cf, decode_times=decode_times)
# write combined Dataset out
ds.to_netcdf('JanFeb-1850.nc', unlimited_dims='time')Problem description
The above code initially creates 2 netCDF files, for Jan-1850 and Feb-1850, that have the variables time and time_bounds, and time:bounds='time_bounds'. It then reads the 2 files back in as a single Dataset, using open_mfdataset, and this Dataset is written back out to a netCDF file. ncdump of this final file is
netcdf JanFeb-1850 {
dimensions:
time = UNLIMITED ; // (2 currently)
d2 = 2 ;
variables:
int64 time(time) ;
time:bounds = "time_bounds" ;
time:units = "hours since 1850-01-16 12:00:00.000000" ;
time:calendar = "noleap" ;
double time_bounds(time, d2) ;
time_bounds:_FillValue = NaN ;
time_bounds:units = "days since 1850-01-01" ;
time_bounds:calendar = "noleap" ;
data:
time = 0, 708 ;
time_bounds =
0, 31,
31, 59 ;
}
The problem is that the units attribute for time and time_bounds are different in this file, contrary to what CF conventions requires.
The final call to to_netcdf is creating a file where time's units (and type) differ from what they are in the intermediate files. These transformations are not being applied to time_bounds.
While the change to time's type is not necessarily an issue, I do find it surprising.
This inconsistency goes away if either of decode_times or decode_cf is set to False in the python code above. In particular, the transformations to time's units and type do not happen.
The inconsistency also goes away if open_mfdataset opens a single file. In this scenario also, the transformations to time's units and type do not happen.
I think that the desired behavior is to either not apply the units and type transformations to time, or to also apply them to time_bounds. The first option would be consistent with the current single-file behavior.
xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.2.1
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 1.1.5
distributed: 1.26.1
matplotlib: 3.0.3
cartopy: None
seaborn: None
setuptools: 40.8.0
pip: 19.0.3
conda: None
pytest: 4.3.1
IPython: 7.4.0
sphinx: None