-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add nbytes
to repr?
#8690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree - I'm constantly checking this attribute. It would be nice to also quickly see the total nbytes of the whole dataset, but I'm not sure where that would go in the repr. |
Yes very much agree. Maaaybe after Or if there's some consensus about adding to data vars, we could start with that. Though arguably it's more useful to have for the whole object... |
This would be a really nice addition! |
Hello, I can suggest the following:
If more customization capabilities are needed, eg choosing between "human" and "binary" prefixes, there exists a library under MIT license, humanize, that specializes into rendering various numbers, including file sizes. Some of its code could potentially be extracted and integrated into xarray. Examples
|
Updated examples after update on the PR. The update is: the size in the header is outside of the
As there are many representations scattered accross the code, both in tests and doctests (800+ occurences when looking for the string Summary of the current repr:
This padding may be irrelevant, as it only preserves layout in some specific case (same dimension tuple width and same dtype width). It could make sense to (1) move the size before the dimension tuple and dtype and keeping its fixed width, or (2) keep its current position but remove the padding. (1) allows quick size comparison of variables by eye thanks to fixed width but puts focus on the size, while (2) is more minimalistic but maybe less readable for size comparison
|
Just a quick comment: @max-sixty wrote pytest-accept for this kind of thing. It's pretty great :) I prefer (2) because dimension names are usually what i look for first in the variable repr. |
I really wish I would have read your comment @dcherian after getting hit with this change in my own CI and spending an hour tracking down each new difference (and only in 4 small example documents). Sphinx apparently really doesn't want to render docs in a consistent order...or maybe doctest isn't running the tests in the same order. As someone who has never checked the raw bytes size of an array, I was surprised when I tracked down this change and saw so many people eager to have it. I guess that just shows how many use cases there are for something so "simple" (the repr). |
Maybe we should add an option to opt-out? Or is it better to have a canonical repr? |
Probably just one repr since it seems like enough people want this. If more people come here to complain then maybe revisit the idea? |
I'd like to reopen this issue to discuss the repr. My main complaint is that the repr says something like "Size: 24B" but I also think this information is generally less relevant on individual coordinates/variables inside the repr, and would prefer not to include it there in favor of showing more data values (unless that data values are hidden, e.g., for a lazy array, in which case we could show memory usage instead). So my suggestion is something like:
|
FWIW I find the size per variable quite helpful. I'm often working with datasets with very differently sized variables. We should do the utilitarian thing, very possibly I'm less representative
We don't technically need the term |
Hello, About display of individual variables'
|
This |
Is your feature request related to a problem?
Would having the
nbytes
value in theDataset
repr be reasonable?I frequently find myself logging this separately. For example:
Describe the solution you'd like
No response
Describe alternatives you've considered
Status quo :)
Additional context
No response
The text was updated successfully, but these errors were encountered: