Skip to content

Simplify BucketedSort/Teach BitArray a useful trick (backport #53199) #53240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 6, 2020

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Mar 6, 2020

Our lovely BitArray compactly stores "flags", lazilly growing its
underlying storage. It is super useful when you need to store one bit of
data for a zillion buckets or a documents or something. Usefully, it
defaults to false. But there is a wrinkle! If you ask it whether or
not a bit is set but it hasn't grown its underlying storage array
"around" that index then it'll throw an ArrayIndexOutOfBoundsException.
The per-document use cases tend to show up in order and don't tend to
mind this too much. But the use case in aggregations, the per-bucket use
case, does. Because buckets are collected out of order all the time.

This changes BitArray so it'll return false if the index is too big
for the underlying storage. After all, that index can't have been set
or else we would have grown the underlying array. Logically, I believe
this makes sense. And it makes my life easy. At the cost of three lines.

but this adds an extra test to every call to get. I think this is
likely ok because it is "very close" to an array index lookup that
already runs the same test. So I think it'll end up merged with the
array bounds check.

Our lovely `BitArray` compactly stores "flags", lazilly growing its
underlying storage. It is super useful when you need to store one bit of
data for a zillion buckets or a documents or something. Usefully, it
defaults to `false`. But there is a wrinkle! If you ask it whether or
not a bit is set but it hasn't grown its underlying storage array
"around" that index then it'll throw an `ArrayIndexOutOfBoundsException`.
The per-document use cases tend to show up in order and don't tend to
mind this too much. But the use case in aggregations, the per-bucket use
case, does. Because buckets are collected out of order all the time.

This changes `BitArray` so it'll return `false` if the index is too big
for the underlying storage. After all, that index *can't* have been set
or else we would have grown the underlying array. Logically, I believe
this makes sense. And it makes my life easy. At the cost of three lines.

*but* this adds an extra test to every call to `get`. I think this is
likely ok because it is "very close" to an array index lookup that
already runs the same test. So I *think* it'll end up merged with the
array bounds check.
@nik9000 nik9000 changed the title Simplify BucketedSort (#53199) Simplify BucketedSort/Teach BitArray a useful trick (backport #53199) Mar 6, 2020
@nik9000 nik9000 merged commit 7c9641e into elastic:7.x Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant