-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Reading netcdf file with string coordinates makes IPython kernel crash (netcdf4 engine) #8544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you read with netcdf4-python only, removing xarray from the mix? If not, please open an issue there. we can't do anything about it unfortunately in that case |
I can, if I do close the dataset each time. Tested with following cell, again it is needed to run it multiple time until the bug occurs: from netCDF4 import Dataset
da = Dataset("test.nc", "r", format="NETCDF4")
# da.close() So maybe xarray does not close the dataset properly. Also, the bug still occurs if I add |
Does it reproduce in a normal python script, outside of a notebook? |
It does for the line |
Probably some of the difference is that Jupyter will hold onto to the outputs in its And if we can reproduce with I'll add — unfortunately it's really difficult to engage with these sorts of issues across libraries without MCVEs. So to make much progress here we do need one. That can be difficult to generate, and probably requires looking at what objects are in memory and their size, or a verifiable case of xarray not closing a file properly. |
Yes I do think IPython has some role in it, as it is fine reading with xarray within a script, and it requires duplicated reading in a cell to trigger the bug. It was strange for me that it would still happen when explicitly using So yes it rather seems to be a strange interaction between netcdf-python and ipython. As it makes the kernel crash it was a pain to find the source of the problem, hence my report, but using |
@Paul-Aime That's a valid point @max-sixty has raised. If the output is not bound to a variable it will be bound to the cell somehow and might not be cleaned up. Please check if wrapping inside print- statement or within display does change the situation.
|
@kmuehlbauer Indeed wrapping inside a print statement does not triggers the bug. |
@Paul-Aime Is there anything we can do here? Otherwise we can close? |
@kmuehlbauer Seems like it might be on netcdf-python, so I think you can close. May I just ask if there is a way to make h5netcdf the default engine ? Because as is the default is not reliably usable in notebooks. |
It looks like there is no option. You could tweak your installation here, moving "h5netcdf" to first position: xarray/xarray/backends/plugins.py Line 26 in 03ec3cb
|
Thanks! Even looks like it works doing it dynamically with |
@Paul-Aime I'm going to close for now. Please feel free to re-open, if there is anything to do on the xarray side of things. |
For future users, I also encountered this issue -- but both using Also using My datasets were assigned to variables, and I did not have string coordinates, however I did have duplicated coordinates as described in #3731, so I think this issue can manifest more broadly. Trying the suggested/documented solution for duplicated coordinates of renaming coordinates (i.e. For reference, my code always crashed on the cell's 2nd run. |
What happened?
When trying to open a netcdf file that has strings as coordinates it makes the notebook kernel crash.
This only happens when
engine=netcdf4
, and not whenengine=h5netcdf
.The bug occurs in IPython, in Jupyter in the web browser and in VSCode notebooks at least.
The bug can consistently be reproduced when reading the same file twice on the same cell, when running the cell twice.
What did you expect to happen?
It is expected for
engine=netcdf4
to work the same asengine=h5netcdf
, i.e. don't make the kernel crash.Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
IPython crashes with:
Segmentation fault (core dumped)
Jupyter Notebook logs:
VSCode notebook Jupyter logs:
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:43:22) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-91-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2023.12.0
pandas: 2.1.4
numpy: 1.26.2
scipy: None
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: 8.18.1
sphinx: None
The text was updated successfully, but these errors were encountered: