Skip to content

Added PNC backend to xarray #1905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
Jun 1, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
eeeb3a5
Added PNC backend to xarray
barronh Oct 27, 2017
5caea63
Fast Forwarding Branch to Make PNC-updates to enable auto merge
barronh Feb 12, 2018
5ac4b6f
Added whats-new documentation
barronh Feb 12, 2018
f73436d
Updating pnc_ to remove DunderArrayMixin dependency
barronh Feb 13, 2018
9507303
Adding basic tests for pnc
barronh Feb 13, 2018
ef22872
Updating for flake8 compliance
barronh Feb 14, 2018
56f087c
flake does not like unused e
barronh Feb 17, 2018
3c023a0
Merge branch 'master' of https://github.com/pydata/xarray into pnc-ba…
barronh Feb 17, 2018
5a3c62d
Updating pnc to PseudoNetCDF
barronh Mar 7, 2018
8eb427d
Remove outer except
barronh Mar 7, 2018
ca75c76
Updating pnc to PseudoNetCDF
barronh Mar 7, 2018
196c03f
Added open and updated init
barronh Mar 17, 2018
751ba1e
Merging to address indexing
barronh Mar 17, 2018
282408f
Updated indexing and test fix
barronh Mar 17, 2018
b1890b1
Added PseudoNetCDF to doc/io.rst
barronh Mar 20, 2018
eda629f
Changing test subtype
barronh Mar 20, 2018
816c7da
Changing test subtype
barronh Mar 20, 2018
c8b2ca3
pnc test case requires netcdf3only
barronh Mar 20, 2018
85ac334
adding backend_kwargs default as dict
barronh Mar 24, 2018
c46caeb
Upgrading tests to CFEncodedDataTest
barronh Mar 24, 2018
6838885
Not currently supporting autoclose
barronh Mar 24, 2018
c3b7c82
Minor updates for flake8
barronh Mar 24, 2018
7906492
Explicit skipping
barronh Mar 25, 2018
4df9fba
removing trailing whitespace from pytest skip
barronh Mar 27, 2018
e4900ab
Merge branch 'master' of https://github.com/pydata/xarray into pnc-ba…
barronh Mar 29, 2018
ec95a3a
Adding pip support
barronh Apr 3, 2018
ad7b709
Addressing comments
barronh Apr 14, 2018
26dd0f9
Bypassing pickle, mask/scale, and object
barronh Apr 15, 2018
d999de1
Added uamiv test
barronh Apr 15, 2018
87e8612
Adding support for autoclose
barronh Apr 15, 2018
dd94be5
Adding bakcend_kwargs to all backends
barronh Apr 15, 2018
2311701
Small tweaks to PNC backend
shoyer Apr 16, 2018
9791b8a
Merge branch 'master' into pnc-backend
shoyer Apr 16, 2018
1d7ad4a
remove warning and update whats-new
barronh Apr 18, 2018
229715a
Merged so that whats-new could be updated
barronh Apr 18, 2018
68997e0
Separating isntall and io pnc doc and updating whats new
barronh Apr 18, 2018
d007bc6
merging renames
barronh Apr 18, 2018
70968ca
fixing line length in test
barronh Apr 18, 2018
c2788b2
updating whats-new and merging
barronh Apr 21, 2018
1f3287e
Tests now use non-netcdf files
barronh Apr 28, 2018
abacc1d
Removing unknown meta-data netcdf support.
barronh Apr 28, 2018
a136ea3
Merge branch 'master' of https://github.com/pydata/xarray into pnc-ba…
barronh Apr 28, 2018
7d8a8ee
flake8 cleanup
barronh Apr 28, 2018
24c8376
Using python 2 and 3 compat testing
barronh Apr 28, 2018
214f51c
Disabling mask_and_scale by default
barronh Apr 28, 2018
5786291
consistent with 3.0.0
barronh May 2, 2018
066cdd5
Updating readers and line length
barronh May 2, 2018
9231e3f
Updating readers and line length
barronh May 2, 2018
80d03a7
Updating readers and line length
barronh May 2, 2018
d2c01de
Adding open_mfdataset test
barronh May 13, 2018
e12288d
merging and updating time test
barronh May 13, 2018
a179c25
Merge branch 'master' of https://github.com/pydata/xarray into pnc-ba…
barronh May 22, 2018
eaa37fe
Using conda version of PseudoNetCDF
barronh May 30, 2018
590e919
Removing xfail for netcdf
barronh May 30, 2018
0df1e60
Merge branch 'master' of https://github.com/pydata/xarray into pnc-ba…
barronh May 30, 2018
989fa4b
Moving pseudonetcdf to v0.15
barronh May 30, 2018
d71bb60
Updating what's new
barronh May 30, 2018
b9b64ca
Fixing open_dataarray CF options
barronh May 30, 2018
10c9bfa
Merge branch 'master' into pnc-backend
shoyer Jun 1, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/requirements-py36.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ dependencies:
- rasterio
- bottleneck
- zarr
- pseudonetcdf>=3.0.1
- pip:
- coveralls
- pytest-cov
Expand Down
7 changes: 5 additions & 2 deletions doc/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ For netCDF and IO
- `cftime <https://unidata.github.io/cftime>`__: recommended if you
want to encode/decode datetimes for non-standard calendars or dates before
year 1678 or after year 2262.
- `PseudoNetCDF <http://github.com/barronh/pseudonetcdf/>`__: recommended
for accessing CAMx, GEOS-Chem (bpch), NOAA ARL files, ICARTT files
(ffi1001) and many other.

For accelerating xarray
~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -65,9 +68,9 @@ with its recommended dependencies using the conda command line tool::

.. _conda: http://conda.io/

We recommend using the community maintained `conda-forge <https://conda-forge.github.io/>`__ channel if you need difficult\-to\-build dependencies such as cartopy or pynio::
We recommend using the community maintained `conda-forge <https://conda-forge.github.io/>`__ channel if you need difficult\-to\-build dependencies such as cartopy, pynio or PseudoNetCDF::

$ conda install -c conda-forge xarray cartopy pynio
$ conda install -c conda-forge xarray cartopy pynio pseudonetcdf

New releases may also appear in conda-forge before being updated in the default
channel.
Expand Down
23 changes: 22 additions & 1 deletion doc/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -650,7 +650,26 @@ We recommend installing PyNIO via conda::

.. _PyNIO: https://www.pyngl.ucar.edu/Nio.shtml

.. _combining multiple files:
.. _io.PseudoNetCDF:

Formats supported by PseudoNetCDF
---------------------------------

xarray can also read CAMx, BPCH, ARL PACKED BIT, and many other file
formats supported by PseudoNetCDF_, if PseudoNetCDF is installed.
PseudoNetCDF can also provide Climate Forecasting Conventions to
CMAQ files. In addition, PseudoNetCDF can automatically register custom
readers that subclass PseudoNetCDF.PseudoNetCDFFile. PseudoNetCDF can
identify readers heuristically, or format can be specified via a key in
`backend_kwargs`.

To use PseudoNetCDF to read such files, supply
``engine='pseudonetcdf'`` to :py:func:`~xarray.open_dataset`.

Add ``backend_kwargs={'format': '<format name>'}`` where `<format name>`
options are listed on the PseudoNetCDF page.

.. _PseuodoNetCDF: http://github.com/barronh/PseudoNetCDF


Formats supported by Pandas
Expand All @@ -662,6 +681,8 @@ exporting your objects to pandas and using its broad range of `IO tools`_.
.. _IO tools: http://pandas.pydata.org/pandas-docs/stable/io.html


.. _combining multiple files:


Combining multiple files
------------------------
Expand Down
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ Enhancements
dask<0.17.4. (related to :issue:`2203`)
By `Keisuke Fujii <https://github.com/fujiisoup`_.

- added a PseudoNetCDF backend for many Atmospheric data formats including
GEOS-Chem, CAMx, NOAA arlpacked bit and many others.
By `Barron Henderson <https://github.com/barronh>`_.

- :py:meth:`~DataArray.cumsum` and :py:meth:`~DataArray.cumprod` now support
aggregation over multiple dimensions at the same time. This is the default
behavior when dimensions are not specified (previously this raised an error).
Expand Down
2 changes: 2 additions & 0 deletions xarray/backends/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from .pynio_ import NioDataStore
from .scipy_ import ScipyDataStore
from .h5netcdf_ import H5NetCDFStore
from .pseudonetcdf_ import PseudoNetCDFDataStore
from .zarr import ZarrStore

__all__ = [
Expand All @@ -21,4 +22,5 @@
'ScipyDataStore',
'H5NetCDFStore',
'ZarrStore',
'PseudoNetCDFDataStore',
]
55 changes: 42 additions & 13 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,9 +152,10 @@ def _finalize_store(write, store):


def open_dataset(filename_or_obj, group=None, decode_cf=True,
mask_and_scale=True, decode_times=True, autoclose=False,
mask_and_scale=None, decode_times=True, autoclose=False,
concat_characters=True, decode_coords=True, engine=None,
chunks=None, lock=None, cache=None, drop_variables=None):
chunks=None, lock=None, cache=None, drop_variables=None,
backend_kwargs=None):
"""Load and decode a dataset from a file or file-like object.

Parameters
Expand All @@ -178,7 +179,8 @@ def open_dataset(filename_or_obj, group=None, decode_cf=True,
taken from variable attributes (if they exist). If the `_FillValue` or
`missing_value` attribute contains multiple values a warning will be
issued and all array values matching one of the multiple values will
be replaced by NA.
be replaced by NA. mask_and_scale defaults to True except for the
pseudonetcdf backend.
decode_times : bool, optional
If True, decode times encoded in the standard NetCDF datetime format
into datetime objects. Otherwise, leave them encoded as numbers.
Expand All @@ -194,7 +196,7 @@ def open_dataset(filename_or_obj, group=None, decode_cf=True,
decode_coords : bool, optional
If True, decode the 'coordinates' attribute to identify coordinates in
the resulting dataset.
engine : {'netcdf4', 'scipy', 'pydap', 'h5netcdf', 'pynio'}, optional
engine : {'netcdf4', 'scipy', 'pydap', 'h5netcdf', 'pynio', 'pseudonetcdf'}, optional
Engine to use when reading files. If not provided, the default engine
is chosen based on available dependencies, with a preference for
'netcdf4'.
Expand All @@ -219,6 +221,10 @@ def open_dataset(filename_or_obj, group=None, decode_cf=True,
A variable or list of variables to exclude from being parsed from the
dataset. This may be useful to drop variables with problems or
inconsistent values.
backend_kwargs: dictionary, optional
A dictionary of keyword arguments to pass on to the backend. This
may be useful when backend options would improve performance or
allow user control of dataset processing.

Returns
-------
Expand All @@ -229,6 +235,10 @@ def open_dataset(filename_or_obj, group=None, decode_cf=True,
--------
open_mfdataset
"""

if mask_and_scale is None:
mask_and_scale = not engine == 'pseudonetcdf'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's do this for decode_cf instead of mask_and_scale. I don't think the other netCDF specific decoding like concatenating characters is relevant to psuedonetcdf, either.

Also, let's be sure to change this for open_mfdataset (and open_dataarray) as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next push:

  • I switched from mask_and_scale to decode_cf.
  • open_mfdataset uses a pass-through of keywords.
  • open_dataarray has been updated

FYI - This leads to an odd outcome -- if I want to decode_times, I have to explicitly enable decode_cf and then disable all the others individually, and enable decode_times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... I didn't realize you do want to decode times. In that case, I guess disabling only mask_and_scale is a reasonable choice.


if not decode_cf:
mask_and_scale = False
decode_times = False
Expand All @@ -238,6 +248,9 @@ def open_dataset(filename_or_obj, group=None, decode_cf=True,
if cache is None:
cache = chunks is None

if backend_kwargs is None:
backend_kwargs = {}

def maybe_decode_store(store, lock=False):
ds = conventions.decode_cf(
store, mask_and_scale=mask_and_scale, decode_times=decode_times,
Expand Down Expand Up @@ -303,18 +316,26 @@ def maybe_decode_store(store, lock=False):
if engine == 'netcdf4':
store = backends.NetCDF4DataStore.open(filename_or_obj,
group=group,
autoclose=autoclose)
autoclose=autoclose,
**backend_kwargs)
elif engine == 'scipy':
store = backends.ScipyDataStore(filename_or_obj,
autoclose=autoclose)
autoclose=autoclose,
**backend_kwargs)
elif engine == 'pydap':
store = backends.PydapDataStore.open(filename_or_obj)
store = backends.PydapDataStore.open(filename_or_obj,
**backend_kwargs)
elif engine == 'h5netcdf':
store = backends.H5NetCDFStore(filename_or_obj, group=group,
autoclose=autoclose)
autoclose=autoclose,
**backend_kwargs)
elif engine == 'pynio':
store = backends.NioDataStore(filename_or_obj,
autoclose=autoclose)
autoclose=autoclose,
**backend_kwargs)
elif engine == 'pseudonetcdf':
store = backends.PseudoNetCDFDataStore.open(
filename_or_obj, autoclose=autoclose, **backend_kwargs)
else:
raise ValueError('unrecognized engine for open_dataset: %r'
% engine)
Expand All @@ -334,9 +355,10 @@ def maybe_decode_store(store, lock=False):


def open_dataarray(filename_or_obj, group=None, decode_cf=True,
mask_and_scale=True, decode_times=True, autoclose=False,
mask_and_scale=None, decode_times=True, autoclose=False,
concat_characters=True, decode_coords=True, engine=None,
chunks=None, lock=None, cache=None, drop_variables=None):
chunks=None, lock=None, cache=None, drop_variables=None,
backend_kwargs=None):
"""Open an DataArray from a netCDF file containing a single data variable.

This is designed to read netCDF files with only one data variable. If
Expand All @@ -363,7 +385,8 @@ def open_dataarray(filename_or_obj, group=None, decode_cf=True,
taken from variable attributes (if they exist). If the `_FillValue` or
`missing_value` attribute contains multiple values a warning will be
issued and all array values matching one of the multiple values will
be replaced by NA.
be replaced by NA. mask_and_scale defaults to True except for the
pseudonetcdf backend.
decode_times : bool, optional
If True, decode times encoded in the standard NetCDF datetime format
into datetime objects. Otherwise, leave them encoded as numbers.
Expand Down Expand Up @@ -403,6 +426,10 @@ def open_dataarray(filename_or_obj, group=None, decode_cf=True,
A variable or list of variables to exclude from being parsed from the
dataset. This may be useful to drop variables with problems or
inconsistent values.
backend_kwargs: dictionary, optional
A dictionary of keyword arguments to pass on to the backend. This
may be useful when backend options would improve performance or
allow user control of dataset processing.

Notes
-----
Expand All @@ -417,13 +444,15 @@ def open_dataarray(filename_or_obj, group=None, decode_cf=True,
--------
open_dataset
"""

dataset = open_dataset(filename_or_obj, group=group, decode_cf=decode_cf,
mask_and_scale=mask_and_scale,
decode_times=decode_times, autoclose=autoclose,
concat_characters=concat_characters,
decode_coords=decode_coords, engine=engine,
chunks=chunks, lock=lock, cache=cache,
drop_variables=drop_variables)
drop_variables=drop_variables,
backend_kwargs=backend_kwargs)

if len(dataset.data_vars) != 1:
raise ValueError('Given file dataset contains more than one data '
Expand Down
101 changes: 101 additions & 0 deletions xarray/backends/pseudonetcdf_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import functools

import numpy as np

from .. import Variable
from ..core.pycompat import OrderedDict
from ..core.utils import (FrozenOrderedDict, Frozen)
from ..core import indexing

from .common import AbstractDataStore, DataStorePickleMixin, BackendArray


class PncArrayWrapper(BackendArray):

def __init__(self, variable_name, datastore):
self.datastore = datastore
self.variable_name = variable_name
array = self.get_array()
self.shape = array.shape
self.dtype = np.dtype(array.dtype)

def get_array(self):
self.datastore.assert_open()
return self.datastore.ds.variables[self.variable_name]

def __getitem__(self, key):
key, np_inds = indexing.decompose_indexer(
key, self.shape, indexing.IndexingSupport.OUTER_1VECTOR)

with self.datastore.ensure_open(autoclose=True):
array = self.get_array()[key.tuple] # index backend array

if len(np_inds.tuple) > 0:
# index the loaded np.ndarray
array = indexing.NumpyIndexingAdapter(array)[np_inds]
return array


class PseudoNetCDFDataStore(AbstractDataStore, DataStorePickleMixin):
"""Store for accessing datasets via PseudoNetCDF
"""
@classmethod
def open(cls, filename, format=None, writer=None,
autoclose=False, **format_kwds):
from PseudoNetCDF import pncopen
opener = functools.partial(pncopen, filename, **format_kwds)
ds = opener()
mode = format_kwds.get('mode', 'r')
return cls(ds, mode=mode, writer=writer, opener=opener,
autoclose=autoclose)

def __init__(self, pnc_dataset, mode='r', writer=None, opener=None,
autoclose=False):

if autoclose and opener is None:
raise ValueError('autoclose requires an opener')

self._ds = pnc_dataset
self._autoclose = autoclose
self._isopen = True
self._opener = opener
self._mode = mode
super(PseudoNetCDFDataStore, self).__init__()

def open_store_variable(self, name, var):
with self.ensure_open(autoclose=False):
data = indexing.LazilyOuterIndexedArray(
PncArrayWrapper(name, self)
)
attrs = OrderedDict((k, getattr(var, k)) for k in var.ncattrs())
return Variable(var.dimensions, data, attrs)

def get_variables(self):
with self.ensure_open(autoclose=False):
return FrozenOrderedDict((k, self.open_store_variable(k, v))
for k, v in self.ds.variables.items())

def get_attrs(self):
with self.ensure_open(autoclose=True):
return Frozen(dict([(k, getattr(self.ds, k))
for k in self.ds.ncattrs()]))

def get_dimensions(self):
with self.ensure_open(autoclose=True):
return Frozen(self.ds.dimensions)

def get_encoding(self):
encoding = {}
encoding['unlimited_dims'] = set(
[k for k in self.ds.dimensions
if self.ds.dimensions[k].isunlimited()])
return encoding

def close(self):
if self._isopen:
self.ds.close()
self._isopen = False
1 change: 1 addition & 0 deletions xarray/tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ def _importorskip(modname, minversion=None):
has_netCDF4, requires_netCDF4 = _importorskip('netCDF4')
has_h5netcdf, requires_h5netcdf = _importorskip('h5netcdf')
has_pynio, requires_pynio = _importorskip('Nio')
has_pseudonetcdf, requires_pseudonetcdf = _importorskip('PseudoNetCDF')
has_cftime, requires_cftime = _importorskip('cftime')
has_dask, requires_dask = _importorskip('dask')
has_bottleneck, requires_bottleneck = _importorskip('bottleneck')
Expand Down
31 changes: 31 additions & 0 deletions xarray/tests/data/example.ict
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
27, 1001
Henderson, Barron
U.S. EPA
Example file with artificial data
JUST_A_TEST
1, 1
2018, 04, 27, 2018, 04, 27
0
Start_UTC
7
1, 1, 1, 1, 1
-9999, -9999, -9999, -9999, -9999
lat, degrees_north
lon, degrees_east
elev, meters
TEST_ppbv, ppbv
TESTM_ppbv, ppbv
0
8
ULOD_FLAG: -7777
ULOD_VALUE: N/A
LLOD_FLAG: -8888
LLOD_VALUE: N/A, N/A, N/A, N/A, 0.025
OTHER_COMMENTS: www-air.larc.nasa.gov/missions/etc/IcarttDataFormat.htm
REVISION: R0
R0: No comments for this revision.
Start_UTC, lat, lon, elev, TEST_ppbv, TESTM_ppbv
43200, 41.00000, -71.00000, 5, 1.2345, 2.220
46800, 42.00000, -72.00000, 15, 2.3456, -9999
50400, 42.00000, -73.00000, 20, 3.4567, -7777
50400, 42.00000, -74.00000, 25, 4.5678, -8888
Binary file added xarray/tests/data/example.uamiv
Binary file not shown.
Loading