Extracting file handles from DataArray #10320
Replies: 1 comment
-
You could reach deep into the lazy indexing classes used behind the scenes, but it would be extremely hacky. Ultimately what you're trying to do is not something really supported by xarray. Xarray doesn't keep around file handles explicitly because it's not tied to files, it's canonical representation is in-memory. In general combining data from multiple files implies that the result refers to the original files in an unboundedly complicated way, so we don't attempt to track that. Because of that, in my opinion what you're trying to do overall (treat netCDF files and Xarray objects equally in your function) doesn't really make sense as a design - they are fundamentally different. One is on-disk, the other is (potentially lazily) in-memory. In-memory chunks for computation and on-disk chunks for compressed storage are completely separate concepts. Note the user provided chunks never have to match the on-disk chunks, and there are workflows where it makes sense to open several on-disk chunks as one in-memory chunk, for example.
Improving this warning within xarray to be more informative seems like a much better solution to your original motivating problem. We would welcome a PR to improve this! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been working on some functionality that lets users inspect either:
in order to validate the user supplied chunking passed to xarray to open a dataset with. The idea is that by inspecting the netCDF disk chunks, we can easily assert whether the user provided chunks are integer multiples of the disk chunks (if not, we expect performance issues), and adjust them to match disk chunks if they are not.
Relevant functionality is here and a PR improving the functionality here.
However, I'm stumped on whether it's possible to actually extract all the file handles from an
xr.DataArray
, if opened withxr.open_mfdataset
:A dataset opened with
xr.open_mfdataset
will have._close
attributes from which the file handles can be extracted with the above logic. However, creating aDataArray
from that dataset will set the.encoding['source']
attribute to the first file handle in the list of paths passed in to open the dataset, and I seem to lose any access to the full set of file handles.I assume since Datasets & DataArrays are lazily loaded that there must still exist some sort of file handle somewhere which could be accessed somehow, even if it is a bit hacky.
Incidentally, AFAIK xarray doesn't really provide any mechanism to confirm that user provided chunks match up with disk chunks nicely, which is the gap this tool I've been working on aims to address (I know a warning is emitted if chunking separates disk chunks, but no info is given on how to fix it). If this can be done cleanly, I'm happy to open a PR adding the functionality if the community thinks it would be useful.
Beta Was this translation helpful? Give feedback.
All reactions