Skip to content

DataTree html repr is very slow #10052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks done
Illviljan opened this issue Feb 16, 2025 · 2 comments
Open
5 tasks done

DataTree html repr is very slow #10052

Illviljan opened this issue Feb 16, 2025 · 2 comments
Labels
bug topic-DataTree Related to the implementation of a DataTree class topic-html-repr topic-performance

Comments

@Illviljan
Copy link
Contributor

What happened?

The DataTree html repr is very slow.

What did you expect to happen?

Calling dt not taking longer than a second.

Minimal Complete Verifiable Example

import numpy as np

import xarray as xr

number_of_files = 700
number_of_groups = 5
number_of_variables = 10

datasets = {}
for f in range(number_of_files):
    for g in range(number_of_groups):
        # Create random data
        time = np.linspace(0, 50 + f, 1 + 1000 * g)
        y = f * time + g

        # Create dataset:
        ds = xr.Dataset(
            data_vars={
                f"temperature_{g}{i}": ("time", y)
                for i in range(number_of_variables // number_of_groups)
            },
            coords={"time": ("time", time)},
        ).chunk()

        # Prepare for xr.DataTree:
        name = f"file_{f}/group_{g}"
        datasets[name] = ds


dt = xr.DataTree.from_dict(datasets)
%timeit dt._repr_html_()
# 37.4 s ± 5.37 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dt.__repr__()
2.58 s ± 182 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

Decent workaround is print(dt) or dt.__repr__(), but is noticably harder to type.

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:04:44) [MSC v.1940 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('Swedish_Sweden', '1252')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.7.1.dev363+g99426cbb.d20240904
pandas: 2.2.2
numpy: 2.2.1
scipy: 1.14.1
netCDF4: 1.7.1
pydap: 3.5
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: 3.9.0
bottleneck: 1.4.0
dask: 2024.11.2
distributed: 2024.11.2
matplotlib: 3.9.2
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.6.1
cupy: None
pint: None
sparse: None
flox: 0.9.10
numpy_groupies: 0.11.2
setuptools: 73.0.1
pip: 24.2
conda: None
pytest: 8.3.2
mypy: 1.14.1
IPython: 8.27.0
sphinx: 8.0.2

@Illviljan Illviljan added bug needs triage Issue that has not been reviewed by xarray team member topic-performance topic-html-repr topic-DataTree Related to the implementation of a DataTree class labels Feb 16, 2025
@shoyer
Copy link
Member

shoyer commented Feb 18, 2025

I agree that this is problematic. I think we need to truncate the repr for large trees.

@benbovy
Copy link
Member

benbovy commented Feb 24, 2025

#9350 #9511

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-DataTree Related to the implementation of a DataTree class topic-html-repr topic-performance
Projects
None yet
Development

No branches or pull requests

4 participants