Implement parallel processing of different kmeans clusters in vector index #17247

kunga · 2025-04-15T19:30:34Z

Now it seems that we run shard scans sequentially:

2025-04-16T10:37:50.1744789070321773Z :BUILD_INDEX NOTICE: Starting TLocalKMeansScan TabletId: 72075186224083200 Id: 281474977825662 TabletId: 72075186224083200 PathId { OwnerId: 72075186224037897 LocalId: 68 } SeqNoGeneration: 25 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224083200 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 42 Child: 8401 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 42 row version v1744789070000/18446744073709551615
2025-04-16T10:37:50.1744789070320288Z :BUILD_INDEX NOTICE: Done TLocalKMeansScan TabletId: 72075186224083410 Id: 281474977825662 Parent: 41 Child: 8201 K: 200 Clusters: 0 State: SAMPLE Round: 0 / 3 LevelBuf size: 0 PostingBuf size: 0 UploadTable: /Root/testdb/alice/idx_vector/indexImplPostingTable UploadBuf size: 0 RetryCount: 0 Id: 281474977825662 TabletId: 72075186224083410 RequestSeqNoGeneration: 25 RequestSeqNoRound: 1 Status: DONE UploadRows: 618663 UploadBytes: 14431986 ReadRows: 3092315 ReadBytes: 12706322335
2025-04-16T10:40:13.1744789213372628Z :BUILD_INDEX NOTICE: Starting TLocalKMeansScan TabletId: 72075186224082996 Id: 281474977825662 TabletId: 72075186224082996 PathId { OwnerId: 72075186224037897 LocalId: 68 } SeqNoGeneration: 25 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224082996 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 44 Child: 8801 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 44 row version v1744789213000/18446744073709551615
2025-04-16T10:40:13.1744789213370433Z :BUILD_INDEX NOTICE: Done TLocalKMeansScan TabletId: 72075186224083200 Id: 281474977825662 Parent: 42 Child: 8401 K: 200 Clusters: 0 State: SAMPLE Round: 0 / 3 LevelBuf size: 0 PostingBuf size: 0 UploadTable: /Root/testdb/alice/idx_vector/indexImplPostingTable UploadBuf size: 0 RetryCount: 0 Id: 281474977825662 TabletId: 72075186224083200 RequestSeqNoGeneration: 25 RequestSeqNoRound: 1 Status: DONE UploadRows: 568754 UploadBytes: 13333988 ReadRows: 2842770 ReadBytes: 11680941930
2025-04-16T10:44:03.1744789443712769Z :BUILD_INDEX NOTICE: Done TLocalKMeansScan TabletId: 72075186224082996 Id: 281474977825662 Parent: 44 Child: 8801 K: 200 Clusters: 0 State: SAMPLE Round: 0 / 3 LevelBuf size: 0 PostingBuf size: 0 UploadTable: /Root/testdb/alice/idx_vector/indexImplPostingTable UploadBuf size: 0 RetryCount: 0 Id: 281474977825662 TabletId: 72075186224082996 RequestSeqNoGeneration: 25 RequestSeqNoRound: 1 Status: DONE UploadRows: 945349 UploadBytes: 21619078 ReadRows: 4725745 ReadBytes: 19418086205
2025-04-16T10:44:03.1744789443714739Z :BUILD_INDEX NOTICE: Starting TLocalKMeansScan TabletId: 72075186224083228 Id: 281474977825662 TabletId: 72075186224083228 PathId { OwnerId: 72075186224037897 LocalId: 68 } SeqNoGeneration: 25 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224083228 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 45 Child: 9001 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 45 row version v1744789443000/18446744073709551615
2025-04-16T10:45:56.1744789556696726Z :BUILD_INDEX NOTICE: Done TLocalKMeansScan TabletId: 72075186224083228 Id: 281474977825662 Parent: 45 Child: 9001 K: 200 Clusters: 0 State: SAMPLE Round: 0 / 3 LevelBuf size: 0 PostingBuf size: 0 UploadTable: /Root/testdb/alice/idx_vector/indexImplPostingTable UploadBuf size: 0 RetryCount: 0 Id: 281474977825662 TabletId: 72075186224083228 RequestSeqNoGeneration: 25 RequestSeqNoRound: 1 Status: DONE UploadRows: 456045 UploadBytes: 10854390 ReadRows: 2279225 ReadBytes: 9365335525
2025-04-16T10:45:56.1744789556698333Z :BUILD_INDEX NOTICE: Starting TLocalKMeansScan TabletId: 72075186224083058 Id: 281474977825662 TabletId: 72075186224083058 PathId { OwnerId: 72075186224037897 LocalId: 68 } SeqNoGeneration: 25 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224083058 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 46 Child: 9201 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 46 row version v1744789556000/18446744073709551615
2025-04-16T10:48:05.1744789685561006Z :BUILD_INDEX NOTICE: Starting TLocalKMeansScan TabletId: 72075186224083322 Id: 281474977825662 TabletId: 72075186224083322 PathId { OwnerId: 72075186224037897 LocalId: 68 } SeqNoGeneration: 25 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224083322 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 47 Child: 9401 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 47 row version v1744789685000/18446744073709551615
2025-04-16T10:48:05.1744789685559055Z :BUILD_INDEX NOTICE: Done TLocalKMeansScan TabletId: 72075186224083058 Id: 281474977825662 Parent: 46 Child: 9201 K: 200 Clusters: 0 State: SAMPLE Round: 0 / 3 LevelBuf size: 0 PostingBuf size: 0 UploadTable: /Root/testdb/alice/idx_vector/indexImplPostingTable UploadBuf size: 0 RetryCount: 0 Id: 281474977825662 TabletId: 72075186224083058 RequestSeqNoGeneration: 25 RequestSeqNoRound: 1 Status: DONE UploadRows: 517710 UploadBytes: 12211020 ReadRows: 2587550 ReadBytes: 10632242950
2025-04-16T10:52:03.1744789923112374Z :BUILD_INDEX NOTICE: Starting TLocalKMeansScan TabletId: 72075186224083426 Id: 281474977825662 TabletId: 72075186224083426 PathId { OwnerId: 72075186224037897 LocalId: 68 } SeqNoGeneration: 25 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224083426 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 48 Child: 9601 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 48 row version v1744789923000/18446744073709551615
2025-04-16T10:52:03.1744789923110602Z :BUILD_INDEX NOTICE: Done TLocalKMeansScan TabletId: 72075186224083322 Id: 281474977825662 Parent: 47 Child: 9401 K: 200 Clusters: 0 State: SAMPLE Round: 0 / 3 LevelBuf size: 0 PostingBuf size: 0 UploadTable: /Root/testdb/alice/idx_vector/indexImplPostingTable UploadBuf size: 0 RetryCount: 0 Id: 281474977825662 TabletId: 72075186224083322 RequestSeqNoGeneration: 25 RequestSeqNoRound: 1 Status: DONE UploadRows: 946163 UploadBytes: 21636986 ReadRows: 4729815 ReadBytes: 19434809835
2025-04-16T10:53:20.1744790000746881Z :BUILD_INDEX NOTICE: Starting TLocalKMeansScan TabletId: 72075186224083152 Id: 281474977825662 TabletId: 72075186224083152 PathId { OwnerId: 72075186224037897 LocalId: 68 } SeqNoGeneration: 25 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224083152 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 49 Child: 9801 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 49 row version v1744790000000/18446744073709551615
2025-04-16T10:53:20.1744790000745187Z :BUILD_INDEX NOTICE: Done TLocalKMeansScan TabletId: 72075186224083426 Id: 281474977825662 Parent: 48 Child: 9601 K: 200 Clusters: 0 State: SAMPLE Round: 0 / 3 LevelBuf size: 0 PostingBuf size: 0 UploadTable: /Root/testdb/alice/idx_vector/indexImplPostingTable UploadBuf size: 0 RetryCount: 0 Id: 281474977825662 TabletId: 72075186224083426 RequestSeqNoGeneration: 25 RequestSeqNoRound: 1 Status: DONE UploadRows: 312171 UploadBytes: 7689162 ReadRows: 1559855 ReadBytes: 6409444195
2025-04-16T10:56:41.1744790201623512Z :BUILD_INDEX NOTICE: Starting TLocalKMeansScan TabletId: 72075186224083390 Id: 281474977825662 TabletId: 72075186224083390 PathId { OwnerId: 72075186224037897 LocalId: 68 } SeqNoGeneration: 25 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224083390 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 50 Child: 10001 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 50 row version v1744790201000/18446744073709551615
2025-04-16T10:56:41.1744790201621157Z :BUILD_INDEX NOTICE: Done TLocalKMeansScan TabletId: 72075186224083152 Id: 281474977825662 Parent: 49 Child: 9801 K: 200 Clusters: 0 State: SAMPLE Round: 0 / 3 LevelBuf size: 0 PostingBuf size: 0 UploadTable: /Root/testdb/alice/idx_vector/indexImplPostingTable UploadBuf size: 0 RetryCount: 0 Id: 281474977825662 TabletId: 72075186224083152 RequestSeqNoGeneration: 25 RequestSeqNoRound: 1 Status: DONE UploadRows: 813621 UploadBytes: 18721062 ReadRows: 4067105 ReadBytes: 16711734445
2025-04-16T10:58:55.1744790335233102Z :BUILD_INDEX NOTICE: Done TLocalKMeansScan TabletId: 72075186224083390 Id: 281474977825662 Parent: 50 Child: 10001 K: 200 Clusters: 0 State: SAMPLE Round: 0 / 3 LevelBuf size: 0 PostingBuf size: 0 UploadTable: /Root/testdb/alice/idx_vector/indexImplPostingTable UploadBuf size: 0 RetryCount: 0 Id: 281474977825662 TabletId: 72075186224083390 RequestSeqNoGeneration: 25 RequestSeqNoRound: 1 Status: DONE UploadRows: 538834 UploadBytes: 12675748 ReadRows: 2693170 ReadBytes: 11066235530
2025-04-16T10:58:55.1744790335234346Z :BUILD_INDEX NOTICE: Starting TLocalKMeansScan TabletId: 72075186224083180 Id: 281474977825662 TabletId: 72075186224083180 PathId { OwnerId: 72075186224037897 LocalId: 68 } SeqNoGeneration: 25 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224083180 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 51 Child: 10201 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 51 row version v1744790335000/18446744073709551615

It seems need to be fixed

The text was updated successfully, but these errors were encountered:

vitalif · 2025-04-28T11:57:22Z

Not sure if it's actually broken. How to reproduce it?...
My test run after partitioning a test table into 8 partitions:

kunga · 2025-04-28T12:03:30Z

Try this

ALTER TABLE alice
ADD INDEX idx_vector
GLOBAL USING vector_kmeans_tree
ON (embedding)
WITH (distance=cosine, vector_type="float", vector_dimension=1024, levels=2, clusters=200);

On dataset https://wiki.yandex-team.ru/kikimr/developers/datashard/vector-search/getting-stated-alice-logs/

First level build will be fine, but the next (I guess second) not

kunga · 2025-04-28T12:04:44Z

You probably need >1 shards of some temporary build table, not a source one to reproduce it

vitalif · 2025-05-15T10:47:53Z

It seems I reproduced it at least to some extent... During the last index build, TReshuffleKMeansScan always ran in parallel, and some of TLocalKMeansScan within a single node ran sequentially. But some of them didn't and also ran in parallel...

vitalif · 2025-05-20T12:32:30Z

Discovered the cause accidentally (:D) during debugging #18355. Different clusters are really processed sequentially on levels > 1. Only some of them are processed in parallel - specifically, only clusters which only have 1 data shard, during the "MultiLocal" kmeans phase, introduced in #11539. :-)

kunga · 2025-05-20T12:42:35Z

As I know we expect all clusters on level >1 to be in 1 data shard

vitalif · 2025-05-20T12:58:31Z

As I know we expect all clusters on level >1 to be in 1 data shard

It seems that the code allows >1 shard per cluster, here:
https://github.com/ydb-platform/ydb/blob/main/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp#L860

However, in my case it skips all .Global shards except the last one on the last level for some reason and it causes #18355 :)

kunga mentioned this issue Apr 15, 2025

👑 Global vector index #8967

Open

48 tasks

azevaykin assigned vitalif Apr 22, 2025

vitalif changed the title ~~Build vector index run sequentially~~ Implement parallel processing of different kmeans clusters in vector index Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement parallel processing of different kmeans clusters in vector index #17247

Implement parallel processing of different kmeans clusters in vector index #17247

kunga commented Apr 15, 2025 •

edited

Loading

vitalif commented Apr 28, 2025

Uh oh!

kunga commented Apr 28, 2025

Uh oh!

kunga commented Apr 28, 2025

Uh oh!

vitalif commented May 15, 2025

Uh oh!

vitalif commented May 20, 2025 •

edited

Loading

Uh oh!

kunga commented May 20, 2025

Uh oh!

vitalif commented May 20, 2025

Uh oh!

Implement parallel processing of different kmeans clusters in vector index #17247

Implement parallel processing of different kmeans clusters in vector index #17247

Comments

kunga commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

vitalif commented Apr 28, 2025

Uh oh!

kunga commented Apr 28, 2025

Uh oh!

kunga commented Apr 28, 2025

Uh oh!

vitalif commented May 15, 2025

Uh oh!

vitalif commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kunga commented May 20, 2025

Uh oh!

vitalif commented May 20, 2025

Uh oh!

kunga commented Apr 15, 2025 •

edited

Loading

vitalif commented May 20, 2025 •

edited

Loading