-
Notifications
You must be signed in to change notification settings - Fork 700
Implement parallel processing of different kmeans clusters in vector index #17247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Try this
On dataset https://wiki.yandex-team.ru/kikimr/developers/datashard/vector-search/getting-stated-alice-logs/ First level build will be fine, but the next (I guess second) not |
You probably need >1 shards of some temporary build table, not a source one to reproduce it |
It seems I reproduced it at least to some extent... During the last index build, TReshuffleKMeansScan always ran in parallel, and some of TLocalKMeansScan within a single node ran sequentially. But some of them didn't and also ran in parallel... |
As I know we expect all clusters on level >1 to be in 1 data shard |
It seems that the code allows >1 shard per cluster, here: However, in my case it skips all .Global shards except the last one on the last level for some reason and it causes #18355 :) |
Uh oh!
There was an error while loading. Please reload this page.
Now it seems that we run shard scans sequentially:
It seems need to be fixed
The text was updated successfully, but these errors were encountered: