Skip to content

CF: also decode time bounds when available #2571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 19, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ Breaking changes
Enhancements
~~~~~~~~~~~~

- Time bounds variables are now also decoded according to CF conventions
(:issue:`2565`).
By `Fabien Maussion <https://github.com/fmaussion>`_.

Bug fixes
~~~~~~~~~

Expand Down
19 changes: 19 additions & 0 deletions xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,25 @@ def stackable(dim):
drop_variables = []
drop_variables = set(drop_variables)

# Time bounds coordinates might miss the decoding attributes
# https://github.com/pydata/xarray/issues/2565
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it also make sense to add a comment with the URL for the "Cell Boundaries" section of the CF conventions here?

if decode_times:
# We list all time variables with bounds
to_update = dict()
for v in variables.values():
attrs = v.attrs
if ('units' in attrs and 'since' in attrs['units'] and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cleaner to write something like:

has_date_units = 'units' in attrs and 'since' in attrs['units']
if has_date_units and 'bounds' in attrs:

'bounds' in attrs):
new_attrs = {'units': attrs['units']}
if 'calendar' in attrs:
new_attrs['calendar'] = attrs['calendar']
to_update[attrs['bounds']] = new_attrs
# For all bound variables, update their time attribute if not present
for k, new_attrs in to_update.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could update the attributes of the potential bounds variables immediately within the for v in variables.values() loop rather than use another loop here (this would remove the need for the to_update dictionary).

if k in variables:
for ak, av in new_attrs.items():
variables[k].attrs.setdefault(ak, av)

new_vars = OrderedDict()
for k, v in iteritems(variables):
if k in drop_variables:
Expand Down
56 changes: 56 additions & 0 deletions xarray/tests/test_coding_times.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,62 @@ def test_decode_cf(calendar):
assert ds.test.dtype == np.dtype('M8[ns]')


def test_decode_cf_time_bounds():

da = DataArray(np.arange(6).reshape((3, 2)), coords={'time': [1, 2, 3]},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it make sense to make this DataArray a fixture and split these cases into separate tests?

dims=('time', 'nbnd'), name='time_bnds')

# Simplest case
ds = da.to_dataset()
ds['time'].attrs['units'] = 'days since 2001-01-01'
ds['time'].attrs['calendar'] = 'standard'
ds['time'].attrs['bounds'] = 'time_bnds'
ds = decode_cf(ds)
assert ds.time_bnds.dtype == np.dtype('M8[ns]')

# Without calendar also OK
ds = da.to_dataset()
ds['time'].attrs['units'] = 'days since 2001-01-01'
ds['time'].attrs['bounds'] = 'time_bnds'
ds = decode_cf(ds)
assert ds.time_bnds.dtype == np.dtype('M8[ns]')

# If previous calendar and units do not overwrite them
if has_cftime_or_netCDF4:
ds = da.to_dataset()
ds['time'].attrs['units'] = 'days since 2001-01-01'
ds['time'].attrs['calendar'] = 'standard'
ds['time'].attrs['bounds'] = 'time_bnds'
ds['time_bnds'].attrs['units'] = 'hours since 2001-01-01'
ds['time_bnds'].attrs['calendar'] = 'noleap'
ds = decode_cf(ds)
assert ds.time_bnds.dtype == np.dtype('O')
assert ds.time_bnds.encoding['units'] == 'hours since 2001-01-01'

# If bounds not available do not complain
ds = da.to_dataset()
ds['time'].attrs['units'] = 'days since 2001-01-01'
ds['time'].attrs['calendar'] = 'standard'
ds['time'].attrs['bounds'] = 'fake_time_bnds'
ds = decode_cf(ds)
assert ds.time_bnds.dtype == np.dtype('int64')

# If not proper time do not change bounds
ds = da.to_dataset()
ds['time'].attrs['units'] = 'days in 2001-01-01'
ds['time'].attrs['calendar'] = 'standard'
ds['time'].attrs['bounds'] = 'time_bnds'
ds = decode_cf(ds)
assert ds.time_bnds.dtype == np.dtype('int64')

# Only decode if asked for
ds = da.to_dataset()
ds['time'].attrs['units'] = 'days since 2001-01-01'
ds['time'].attrs['bounds'] = 'time_bnds'
ds = decode_cf(ds, decode_times=False)
assert ds.time_bnds.dtype == np.dtype('int64')


@pytest.fixture(params=_ALL_CALENDARS)
def calendar(request):
return request.param
Expand Down