Skip to content

Vector index build exception #16813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #8967
azevaykin opened this issue Apr 5, 2025 · 3 comments
Closed
Tracked by #8967

Vector index build exception #16813

azevaykin opened this issue Apr 5, 2025 · 3 comments
Assignees

Comments

@azevaykin
Copy link
Collaborator

azevaykin commented Apr 5, 2025

SQL command

ALTER TABLE alice
ADD INDEX idx_vector
GLOBAL USING vector_kmeans_tree
ON (embedding)
WITH (distance=cosine, vector_type="float", vector_dimension=1024, levels=2, clusters=200);

Schemeshard is restarting constantly.
Exception:

2025-04-05T15:09:19.595435Z :BUILD_INDEX INFO: TIndexBuilder::TXTYPE_PROGRESS_INDEX_BUILD: TTxBuildProgress: Resume: id# 844424930876970
2025-04-05T15:09:19.595440Z :BUILD_INDEX DEBUG: TIndexBuilder::TXTYPE_PROGRESS_INDEX_BUILD: TTxBuildProgress: Resume: TBuildInfo{ IndexBuildId: 844424930876970, Uid: , DomainPathId: [OwnerId: 72075186224037897, LocalPathId: 1], TablePathId: [OwnerId: 72075186224037897, LocalPathId: 10], IndexType: EIndexTypeGlobalVectorKmeansTree, IndexName: idx_vector_langs, IndexColumn: langs, IndexColumn: embedding, State: Done, IsCancellationRequested: 0, Issue: , SubscribersCount: 0, CreateSender: [0:0:0], AlterMainTableTxId: 0, AlterMainTableTxStatus: StatusSuccess, AlterMainTableTxDone: 0, LockTxId: 562949954171413, LockTxStatus: StatusAccepted, LockTxDone: 1, InitiateTxId: 562949954171414, InitiateTxStatus: StatusAccepted, InitiateTxDone: 1, SnapshotStepId: 0, ApplyTxId: 562949954171417, ApplyTxStatus: StatusAccepted, ApplyTxDone: 1, UnlockTxId: 562949954171418, UnlockTxStatus: StatusAccepted, UnlockTxDone: 1, ToUploadShards: 0, DoneShards: 0, Processed: { upload rows: 2506934, upload bytes: 7741665134, read rows: 1003545, read bytes: 1617203692 }, Billed: { upload rows: 0, upload bytes: 0, read rows: 0, read bytes: 0 }}
2025-04-05T15:09:19.595442Z :BUILD_INDEX INFO: TIndexBuilder::TXTYPE_PROGRESS_INDEX_BUILD: TTxBuildProgress: Resume: id# 844424931086059
2025-04-05T15:09:19.595448Z :BUILD_INDEX DEBUG: TIndexBuilder::TXTYPE_PROGRESS_INDEX_BUILD: TTxBuildProgress: Resume: TBuildInfo{ IndexBuildId: 844424931086059, Uid: , DomainPathId: [OwnerId: 72075186224037897, LocalPathId: 1], TablePathId: [OwnerId: 72075186224037897, LocalPathId: 83], IndexType: EIndexTypeGlobalVectorKmeansTree, IndexName: idx_vector, IndexColumn: embedding, State: Filling, IsCancellationRequested: 0, Issue: , SubscribersCount: 0, CreateSender: [0:0:0], AlterMainTableTxId: 0, AlterMainTableTxStatus: StatusSuccess, AlterMainTableTxDone: 0, LockTxId: 844424931097069, LockTxStatus: StatusAccepted, LockTxDone: 1, InitiateTxId: 844424931097070, InitiateTxStatus: StatusAccepted, InitiateTxDone: 1, SnapshotStepId: 0, ApplyTxId: 0, ApplyTxStatus: StatusSuccess, ApplyTxDone: 0, UnlockTxId: 0, UnlockTxStatus: StatusSuccess, UnlockTxDone: 0, ToUploadShards: 0, DoneShards: 0, Processed: { upload rows: 439253717, upload bytes: 1799455653764, read rows: 439228126, read bytes: 1800226359090 }, Billed: { upload rows: 0, upload bytes: 0, read rows: 0, read bytes: 0 }}
2025-04-05T15:09:19.595469Z :BUILD_INDEX DEBUG: TIndexBuilder::TXTYPE_PROGRESS_INDEX_BUILD: FillIndex::SingleKMeans::Start { K = 200, Level = 2 / 2, Parent = [1..6..200], Child = [201..1201..40200], State = Local }, { Rows = 0, Sample = Collect }, { Done = 0, ToUpload = 1, InProgress = 0 }
2025-04-05T15:09:19.595655Z :TABLET_EXECUTOR CRIT: Tablet 72075186224037897 unhandled exception yexception: util/generic/hash_table.cpp:50: Key not found in hashtable: NKikimr::NSchemeShard::TShardIdx
0. /home/azevaykin/github/contrib/libs/cxxsupp/libcxxrt/exception.cc:839: throw_exception(__cxxabiv1::__cxa_exception*) @ 0x56468BDDF2CC
1. /home/azevaykin/github/contrib/libs/cxxsupp/libcxxrt/exception.cc:882: __cxa_throw @ 0x56468BDDF2CC
2. /home/azevaykin/github/util/generic/hash_table.cpp:50: NPrivate::ThrowKeyNotFoundInHashTableException(TBasicStringBuf<char, std::__y1::char_traits<char>>) @ 0x56468BE5CCC5
3. /home/azevaykin/github/util/generic/hash.h:269: NKikimr::NSchemeShard::TShardInfo& THashMap<NKikimr::NSchemeShard::TShardIdx, NKikimr::NSchemeShard::TShardInfo, THash<NKikimr::NSchemeShard::TShardIdx>, TEqualTo<NKikimr::NSchemeShard::TShardIdx>, std::__y1::allocator<NKikimr::NSchemeShard::TShardIdx>>::at<NKikimr::NSchemeShard::TShardIdx>(NKikimr::NSchemeShard::TShardIdx const&) @ 0x56468F7C32FF
4. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:513: NKikimr::TUi64Id<NKikimr::NSchemeShard::TTabletIdTag> NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::CommonFillRecord<true, NKikimrTxDataShard::TEvLocalKMeansRequest>(NKikimrTxDataShard::TEvLocalKMeansRequest&, NKikimr::NSchemeShard::TShardIdx, NKikimr::NSchemeShard::TIndexBuildInfo&) @ 0x56468FBEBB6E
5. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:654: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::SendKMeansLocalRequest(NKikimr::NSchemeShard::TShardIdx, NKikimr::NSchemeShard::TIndexBuildInfo&) @ 0x56468FBEB6C3
6. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:892: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::SendKMeansLocal(NKikimr::NSchemeShard::TIndexBuildInfo&)::'lambda'(NKikimr::NSchemeShard::TShardIdx)::operator()(NKikimr::NSchemeShard::TShardIdx) const @ 0x56468FBEAAB9
7. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:781: bool NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::SendToShards<NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::SendKMeansLocal(NKikimr::NSchemeShard::TIndexBuildInfo&)::'lambda'(NKikimr::NSchemeShard::TShardIdx)>(NKikimr::NSchemeShard::TIndexBuildInfo&, NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::SendKMeansLocal(NKikimr::NSchemeShard::TIndexBuildInfo&)::'lambda'(NKikimr::NSchemeShard::TShardIdx)&&) @ 0x56468FBEAAB9
8. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:892: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::SendKMeansLocal(NKikimr::NSchemeShard::TIndexBuildInfo&) @ 0x56468FBE1B83
9. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:906: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::SendVectorIndex(NKikimr::NSchemeShard::TIndexBuildInfo&) @ 0x56468FBE1B83
10. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:987: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::FillVectorIndex(NKikimr::NTabletFlatExecutor::TTransactionContext&, NKikimr::NSchemeShard::TIndexBuildInfo&) @ 0x56468FBE1B83
11. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:1065: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::FillIndex(NKikimr::NTabletFlatExecutor::TTransactionContext&, NKikimr::NSchemeShard::TIndexBuildInfo&) @ 0x56468FBDF5C3
12. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:1138: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::DoExecute(NKikimr::NTabletFlatExecutor::TTransactionContext&, NActors::TActorContext const&) @ 0x56468FBDDFA7
13. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index_tx_base.cpp:411: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxBase::Execute(NKikimr::NTabletFlatExecutor::TTransactionContext&, NActors::TActorContext const&) @ 0x56468FC0931A
14. /home/azevaykin/github/ydb/core/tablet_flat/flat_executor.cpp:1910: NKikimr::NTabletFlatExecutor::TExecutor::ExecuteTransaction(NKikimr::NTabletFlatExecutor::TSeat*) @ 0x56468E6DED12
15. /home/azevaykin/github/ydb/core/tablet_flat/flat_executor.cpp:4143: NKikimr::NTabletFlatExecutor::TExecutor::StateWork(TAutoPtr<NActors::IEventHandle, TDelete>&) @ 0x56468E6CA94E
16. /home/azevaykin/github/ydb/library/actors/core/actor.cpp:280: NActors::IActor::Receive(TAutoPtr<NActors::IEventHandle, TDelete>&) @ 0x56468CA00CB2
2025-04-05T15:09:19.595726Z :TABLET_MAIN NOTICE: Tablet: 72075186224037897  Type: SchemeShard, EReason: ReasonPill, SuggestedGeneration: 16349, KnownGeneration: 16349 Marker# TSYS31
2025-04-05T15:09:19.595740Z :FLAT_TX_SCHEMESHARD INFO: Clear TempDirsState with owners number: 0
@azevaykin
Copy link
Collaborator Author

azevaykin commented Apr 6, 2025

I reinstalled the cluster.

The second run gave other error. But the build index operation was simply aborted. SchemeShard worked well.

2025-04-06T05:27:54.148544Z :BUILD_INDEX NOTICE: Finish TLocalKMeansScan Id: 844424930201761 TabletId: 72075186224058405 RequestSeqNoGeneration: 1 RequestSeqNoRound: 2 Status: BAD_REQUEST Issues { message: "Unknown table id: 11" severity: 1 }

2025-04-06T05:27:54.151029Z :BUILD_INDEX DEBUG: TIndexBuilder::TXTYPE_PROGRESS_INDEX_BUILD: TTxReply : TEvLocalKMeansResponse, TIndexBuildInfo: TBuildInfo{ IndexBuildId: 844424930201761, Uid: , DomainPathId: [OwnerId: 72075186224037897, LocalPathId: 1], TablePathId: [OwnerId: 72075186224037897, LocalPathId: 7], IndexType: EIndexTypeGlobalVectorKmeansTree, IndexName: idx_vector, IndexColumn: embedding, State: Filling, IsCancellationRequested: 0, Issue: , SubscribersCount: 1, CreateSender: [50005:7490060647350208635:72160], AlterMainTableTxId: 0, AlterMainTableTxStatus: StatusSuccess, AlterMainTableTxDone: 0, LockTxId: 844424930217069, LockTxStatus: StatusAccepted, LockTxDone: 1, InitiateTxId: 844424930217070, InitiateTxStatus: StatusAccepted, InitiateTxDone: 1, SnapshotStepId: 0, ApplyTxId: 0, ApplyTxStatus: StatusSuccess, ApplyTxDone: 0, UnlockTxId: 0, UnlockTxStatus: StatusSuccess, UnlockTxDone: 0, ToUploadShards: 0, DoneShards: 0, ShardsInProgress: 72075186224037897:20518, Processed: { upload rows: 432888331, upload bytes: 1778736665415, read rows: 433543562, read bytes: 1781431986922 }, Billed: { upload rows: 0, upload bytes: 0, read rows: 0, read bytes: 0 }}, record: Id: 844424930201761 TabletId: 72075186224058405 RequestSeqNoGeneration: 1 RequestSeqNoRound: 2 Status: BAD_REQUEST Issues { message: "Unknown table id: 11" severity: 1 }

Full log 2025-04-06_05-27-54.log.zip

@azevaykin
Copy link
Collaborator Author

An attempt to build a prefix index

ALTER TABLE alice
ADD INDEX idx_vector
GLOBAL USING vector_kmeans_tree
ON (intent, embedding)
WITH (distance=cosine, vector_type="float", vector_dimension=1024, levels=1, clusters=200);

The attempt run gave other error. SchemeShard Schemeshard is restarting constantly.

2025-04-06T07:52:57.706892Z :BUILD_INDEX INFO: TIndexBuilder::TXTYPE_PROGRESS_INDEX_BUILD: TTxBuildProgress: Resume: id# 844424930186400
2025-04-06T07:52:57.706905Z :BUILD_INDEX DEBUG: TIndexBuilder::TXTYPE_PROGRESS_INDEX_BUILD: TTxBuildProgress: Resume: TBuildInfo{ IndexBuildId: 844424930186400, Uid: , DomainPathId: [OwnerId: 72075186224037897, LocalPathId: 1], TablePathId: [OwnerId: 72075186224037897, LocalPathId: 7], IndexType: EIndexTypeGlobalVectorKmeansTree, IndexName: idx_vector, IndexColumn: intent, IndexColumn: embedding, State: Filling, IsCancellationRequested: 0, Issue: , SubscribersCount: 0, CreateSender: [0:0:0], AlterMainTableTxId: 0, AlterMainTableTxStatus: StatusSuccess, AlterMainTableTxDone: 0, LockTxId: 844424930219287, LockTxStatus: StatusAccepted, LockTxDone: 1, InitiateTxId: 844424930219288, InitiateTxStatus: StatusAccepted, InitiateTxDone: 1, SnapshotStepId: 0, ApplyTxId: 0, ApplyTxStatus: StatusSuccess, ApplyTxDone: 0, UnlockTxId: 0, UnlockTxStatus: StatusSuccess, UnlockTxDone: 0, ToUploadShards: 0, DoneShards: 0, Processed: { upload rows: 288624165, upload bytes: 1198069787672, read rows: 288624165, read bytes: 1198069787672 }, Billed: { upload rows: 0, upload bytes: 0, read rows: 0, read bytes: 0 }}
2025-04-06T07:52:57.706917Z :BUILD_INDEX DEBUG: TIndexBuilder::TXTYPE_PROGRESS_INDEX_BUILD: FillIndex::InitiateShards { K = 200, Level = 2 / 2, Parent = [1..5580836854440..5580836854440], Child = [5580836854441..5580836854441..1121748207742440], State = MultiLocal }, { Rows = 0, Sample = Collect }, { Done = 0, ToUpload = 0, InProgress = 0 }
2025-04-06T07:52:57.706929Z :BUILD_INDEX DEBUG: infinite range { From: -inf, To: inf }
2025-04-06T07:52:57.706932Z :BUILD_INDEX DEBUG: shard 72075186224037897:25539 range { From: -inf, To: { count: 2 } }
2025-04-06T07:52:57.707081Z :TABLET_EXECUTOR CRIT: Tablet 72075186224037897 unhandled exception yexception: ydb/core/scheme/scheme_tablecell.h:171: AsValue<T>() type size8 doesn't match TCell size 21
0. /home/azevaykin/github/contrib/libs/cxxsupp/libcxxrt/exception.cc:839: throw_exception(__cxxabiv1::__cxa_exception*) @ 0x5603D4A8B22C
1. /home/azevaykin/github/contrib/libs/cxxsupp/libcxxrt/exception.cc:882: __cxa_throw @ 0x5603D4A8B22C
2. /home/azevaykin/github/ydb/core/scheme/scheme_tablecell.h:171: unsigned long NKikimr::TCell::AsValue<unsigned long, unsigned long>() const @ 0x5603D447CBED
3. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_info_types.h:3272: NKikimr::NSchemeShard::TIndexBuildInfo::TKMeans::RangeToBorders(NKikimr::TSerializedTableRange const&) const::'lambda0'()::operator()() const @ 0x5603DCAD8B61
4. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_info_types.h:3269: NKikimr::NSchemeShard::TIndexBuildInfo::TKMeans::RangeToBorders(NKikimr::TSerializedTableRange const&) const @ 0x5603DCAD8B61
5. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_info_types.cpp:2256: NKikimr::NSchemeShard::TIndexBuildInfo::AddParent(NKikimr::TSerializedTableRange const&, NKikimr::NSchemeShard::TShardIdx) @ 0x5603DCAD8B61
6. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:1328: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::InitiateShards(NKikimr::NIceDb::TNiceDb&, NKikimr::NSchemeShard::TIndexBuildInfo&) @ 0x5603D888DE82
7. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:1062: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::FillIndex(NKikimr::NTabletFlatExecutor::TTransactionContext&, NKikimr::NSchemeShard::TIndexBuildInfo&) @ 0x5603D888C84D
8. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index__progress.cpp:1138: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxProgress::DoExecute(NKikimr::NTabletFlatExecutor::TTransactionContext&, NActors::TActorContext const&) @ 0x5603D888B027
9. /home/azevaykin/github/ydb/core/tx/schemeshard/schemeshard_build_index_tx_base.cpp:411: NKikimr::NSchemeShard::TSchemeShard::TIndexBuilder::TTxBase::Execute(NKikimr::NTabletFlatExecutor::TTransactionContext&, NActors::TActorContext const&) @ 0x5603D88B639A
10. /home/azevaykin/github/ydb/core/tablet_flat/flat_executor.cpp:1910: NKikimr::NTabletFlatExecutor::TExecutor::ExecuteTransaction(NKikimr::NTabletFlatExecutor::TSeat*) @ 0x5603D738BD12
11. /home/azevaykin/github/ydb/core/tablet_flat/flat_executor.cpp:4143: NKikimr::NTabletFlatExecutor::TExecutor::StateWork(TAutoPtr<NActors::IEventHandle, TDelete>&) @ 0x5603D737794E
12. /home/azevaykin/github/ydb/library/actors/core/actor.cpp:280: NActors::IActor::Receive(TAutoPtr<NActors::IEventHandle, TDelete>&) @ 0x5603D56ACF32
2025-04-06T07:52:57.707134Z :TABLET_MAIN NOTICE: Tablet: 72075186224037897  Type: SchemeShard, EReason: ReasonPill, SuggestedGeneration: 1203, KnownGeneration: 1203 Marker# TSYS31

@kunga kunga mentioned this issue Apr 8, 2025
37 tasks
@kunga
Copy link
Member

kunga commented Apr 10, 2025

As expected the error is triggered by shard split:

2025-04-10T16:15:39.1744290939492884Z :FLAT_TX_SCHEMESHARD DEBUG: Want to split tablet 72075186224075165 by size force split by size (shardSize: 2515338457, maxShardSize: 2147483648)

2025-04-10T16:17:19.1744291039966654Z :BUILD_INDEX ERROR: Rejecting TLocalKMeansScan bad request TabletId: 72075186224075165 Id: 281474977255657 TabletId: 72075186224075165 PathId { OwnerId: 72075186224037897 LocalId: 43 } SeqNoGeneration: 18 SeqNoRound: 1 Settings { metric: DISTANCE_COSINE vector_type: VECTOR_TYPE_FLOAT vector_dimension: 1024 } Seed: 72075186224075165 K: 200 Upload: UPLOAD_BUILD_TO_POSTING NeedsRounds: 3 ParentFrom: 3 Child: 601 LevelName: "/Root/testdb/alice/idx_vector/indexImplLevelTable" PostingName: "/Root/testdb/alice/idx_vector/indexImplPostingTable" EmbeddingColumn: "embedding" ParentTo: 3 with response Id: 281474977255657 TabletId: 72075186224075165 RequestSeqNoGeneration: 18 RequestSeqNoRound: 1 Status: BAD_REQUEST Issues { message: "Shard 72075186224075165 is 5 and not ready for requests" severity: 1 } Issues { message: "Unknown table id: 43" severity: 1 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants