You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Micro optimize dataset.isel for speed on large datasets
This targets optimization for datasets with many "scalar" variables
(that is variables without any dimensions). This can happen in the
context where you have many pieces of small metadata that relate to
various facts about an experimental condition.
For example, we have about 80 of these in our datasets (and I want to
incrase this number)
Our datasets are quite large (On the order of 1TB uncompresed) so we
often have one dimension that is in the 10's of thousands.
However, it has become quite slow to index in the dataset.
We therefore often "carefully slice out the matadata we need" prior to
doing anything with our dataset, but that isn't quite possible with you
want to orchestrate things with a parent application.
These optimizations are likely "minor" but considering the results of
the benchmark, I think they are quite worthwhile:
* main (as of pydata#9001) - 2.5k its/s
* With pydata#9002 - 4.2k its/s
* With this Pull Request (on top of pydata#9002) -- 6.1k its/s
Thanks for considering.
0 commit comments