Skip to content

CS: VERIFY long query interuption in plain_reader/iterator #14832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
naspirato opened this issue Feb 20, 2025 · 2 comments · Fixed by #14888
Closed

CS: VERIFY long query interuption in plain_reader/iterator #14832

naspirato opened this issue Feb 20, 2025 · 2 comments · Fixed by #14888

Comments

@naspirato
Copy link
Collaborator

naspirato commented Feb 20, 2025

проблема при прерывании запроса, который ооочень долго выполняется

During workload log (bulk_upsert+select) on LONG cluster
version: 19ac1f8
issue

VERIFY failed (2025-02-19T13:44:48.793918+0300): SelfId=[50014:7473079170149311476:3178];TabletId=72075186224037933;ScanId=2;TxId=562949973191911;ScanGen=1;task_identifier=;verification=!GetContext()->IsAborted();fline=source.cpp:26;
ydb/library/actors/core/log.cpp:754
  ~TVerifyFormattedRecordWriter(): requirement false failed
2025-02-19T14:07:27.176141Z :TX_DATASHARD NOTICE: Outdated readset for 1739525360741:844424933192005 at 72075186224038391
2025-02-19T14:07:27.290883Z :TX_DATASHARD NOTICE: Outdated readset for 1739525360741:844424933192005 at 72075186224038391
2025-02-19T14:07:28.043956Z :TX_COORDINATOR NOTICE: tablet# 72075186224037896 HANDLE EvMediatorQueueRestart MediatorId# 72075186224037890
2025-02-19T14:07:28.045063Z :TX_COORDINATOR NOTICE: tablet# 72075186224037896 HANDLE EvMediatorQueueRestart MediatorId# 72075186224037892
2025-02-19T14:07:28.045079Z :TX_COORDINATOR NOTICE: tablet# 72075186224037896 HANDLE EvMediatorQueueRestart MediatorId# 72075186224037889
2025-02-19T14:07:28.206363Z :BS_PROXY_RANGE NOTICE: [f640b2101ee9906b] Result# TEvRangeResult {Status# OK From# [72075186224037892:550:1:0:0:0:0] To# [72075186224037892:551:0:0:16777215:67108863:0] Size# 0} Marker# DSR06
2025-02-19T14:07:28.240020Z :TX_COORDINATOR NOTICE: tablet# 72075186224037896 HANDLE EvMediatorQueueRestart MediatorId# 72075186224037892
2025-02-19T14:07:28.337038Z :TX_COORDINATOR NOTICE: tablet# 72075186224037896 HANDLE EvMediatorQueueRestart MediatorId# 72075186224037890
2025-02-19T14:07:28.455021Z :BS_PROXY_RANGE NOTICE: [7ddafb4b78ae26a0] Result# TEvRangeResult {Status# OK From# [72075186224037889:555:1:0:0:0:0] To# [72075186224037889:556:0:0:16777215:67108863:0] Size# 0} Marker# DSR06
2025-02-19T14:07:28.468331Z :TX_COORDINATOR NOTICE: tablet# 72075186224037896 HANDLE EvMediatorQueueRestart MediatorId# 72075186224037889
0. /-S/util/system/yassert.cpp:83: InternalPanicImpl @ 0x1CA07FF8
1. /-S/util/system/yassert.cpp:55: Panic @ 0x1C9F8FBA
2. /tmp//-S/ydb/library/actors/core/log.cpp:754: ~TVerifyFormattedRecordWriter @ 0x1F02D025
3. /tmp//-S/ydb/core/tx/columnshard/engines/reader/plain_reader/iterator/source.cpp:26: RegisterInterval @ 0x3642A0A6
4. /tmp//-S/ydb/core/tx/columnshard/engines/reader/plain_reader/iterator/interval.cpp:53: TFetchingInterval @ 0x364232E8
5. /-S/contrib/libs/cxxsupp/libcxx/include/__memory/allocator.h:167: construct<NKikimr::NOlap::NReader::NPlain::TFetchingInterval, NKikimr::NArrow::NMerger::TSortableBatchPosition &, const NKikimr::NArrow::NMerger::TSortableBatchPosition &, const unsigned int &, const THashMap<unsigned int, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::IDataSource>, THash<unsigned int>, TEqualTo<unsigned int>, std::__y1::allocator<unsigned int> > &, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::TSpecialReadContext> &, bool, bool, bool> @ 0x36415010
6. /-S/contrib/libs/cxxsupp/libcxx/include/__memory/allocator_traits.h:319: construct<NKikimr::NOlap::NReader::NPlain::TFetchingInterval, NKikimr::NArrow::NMerger::TSortableBatchPosition &, const NKikimr::NArrow::NMerger::TSortableBatchPosition &, const unsigned int &, const THashMap<unsigned int, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::IDataSource>, THash<unsigned int>, TEqualTo<unsigned int>, std::__y1::allocator<unsigned int> > &, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::TSpecialReadContext> &, bool, bool, bool, 0> @ 0x36415010
7. /-S/contrib/libs/cxxsupp/libcxx/include/__memory/shared_ptr.h:296: __shared_ptr_emplace<NKikimr::NArrow::NMerger::TSortableBatchPosition &, const NKikimr::NArrow::NMerger::TSortableBatchPosition &, const unsigned int &, const THashMap<unsigned int, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::IDataSource>, THash<unsigned int>, TEqualTo<unsigned int>, std::__y1::allocator<unsigned int> > &, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::TSpecialReadContext> &, bool, bool, bool, std::__y1::allocator<NKikimr::NOlap::NReader::NPlain::TFetchingInterval>, 0> @ 0x36415010
8. /-S/contrib/libs/cxxsupp/libcxx/include/__memory/shared_ptr.h:857: allocate_shared<NKikimr::NOlap::NReader::NPlain::TFetchingInterval, std::__y1::allocator<NKikimr::NOlap::NReader::NPlain::TFetchingInterval>, NKikimr::NArrow::NMerger::TSortableBatchPosition &, const NKikimr::NArrow::NMerger::TSortableBatchPosition &, const unsigned int &, const THashMap<unsigned int, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::IDataSource>, THash<unsigned int>, TEqualTo<unsigned int>, std::__y1::allocator<unsigned int> > &, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::TSpecialReadContext> &, bool, bool, bool, 0> @ 0x36415010
9. /-S/contrib/libs/cxxsupp/libcxx/include/__memory/shared_ptr.h:865: make_shared<NKikimr::NOlap::NReader::NPlain::TFetchingInterval, NKikimr::NArrow::NMerger::TSortableBatchPosition &, const NKikimr::NArrow::NMerger::TSortableBatchPosition &, const unsigned int &, const THashMap<unsigned int, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::IDataSource>, THash<unsigned int>, TEqualTo<unsigned int>, std::__y1::allocator<unsigned int> > &, std::__y1::shared_ptr<NKikimr::NOlap::NReader::NPlain::TSpecialReadContext> &, bool, bool, bool, 0> @ 0x36415010
10. /tmp//-S/ydb/core/tx/columnshard/engines/reader/plain_reader/iterator/scanner.cpp:161: BuildNextInterval @ 0x36415010
11. /tmp//-S/ydb/core/tx/columnshard/engines/reader/plain_reader/iterator/plain_read_data.cpp:78: DoReadNextInterval @ 0x363FA494
12. /-S/ydb/core/tx/columnshard/engines/reader/abstract/read_context.h:214: ReadNextInterval @ 0x363E385B
13. /tmp//-S/ydb/core/tx/columnshard/engines/reader/common_reader/iterator/iterator.cpp:25: ReadNextInterval @ 0x363E385B
14. /tmp//-S/ydb/core/tx/columnshard/engines/reader/actor/actor.cpp:305: ContinueProcessing @ 0x367F8F6C
15. /tmp//-S/ydb/core/tx/columnshard/engines/reader/actor/actor.cpp:121: HandleScan @ 0x367FC111
16. /-S/ydb/core/tx/columnshard/engines/reader/actor/actor.h:57: StateScan @ 0x367F809E
17. /-S/ydb/library/actors/core/actor.h:553: Receive @ 0x1F00DD9C
18. /tmp//-S/ydb/library/actors/core/executor_thread.cpp:269: Execute @ 0x1F0083D4
19. /tmp//-S/ydb/library/actors/core/executor_thread.cpp:460: operator() @ 0x1F010CCE
20. /tmp//-S/ydb/library/actors/core/executor_thread.cpp:512: ProcessExecutorPool @ 0x1F010229
21. /tmp//-S/ydb/library/actors/core/executor_thread.cpp:538: ThreadProc @ 0x1F0121BE
22. /-S/util/system/thread.cpp:244: ThreadProxy @ 0x1CA14134
23. /tmp//-S/contrib/libs/clang18-rt/lib/asan/asan_interceptors.cpp:239: asan_thread_start @ 0x1C6BBD48
24. ??:0: ?? @ 0x7F9BE52FF608
25. ??:0: ?? @ 0x7F9BE5224352
GRPCs port is not defined.
Determined node ID: 0
Trying to register dynamic node to vla5-2568.search.yandex.net:2135
Success. Registered as 50009
Node name:
Trying to get configs from vla5-2573.search.yandex.net:2135
Success.
configured
Starting Kikimr r-1 built by kirrysin
@naspirato naspirato changed the title CS: long query interuption CS: long query interuption in plain_reader/iterator Feb 20, 2025
@naspirato naspirato changed the title CS: long query interuption in plain_reader/iterator CS: VERIFY long query interuption in plain_reader/iterator Feb 20, 2025
@naspirato naspirato marked this as a duplicate of #14840 Feb 20, 2025
@naspirato
Copy link
Collaborator Author

it shoud be fixed in #14888

@naspirato naspirato linked a pull request Feb 21, 2025 that will close this issue
@naspirato
Copy link
Collaborator Author

naspirato commented Feb 21, 2025

кластер LONG

  • собрал ./ya make ---build='release ydb/apps/ydbd
  • залил ./ydbd_slice update $LONG all --binary ~/ydbd_pr_14888
  • запустил workload log column (bulk_upsert + select) (данные были старые)
  • upsert вижу, есть в мониторинге (скрин) https://nda.ya.ru/t/BYS4reMe7CB4Fd
  • select что-то пытался, но ни одного успешного выполнения, даже SELECT * FROM log_workload_column limit 1 - получаю ошибки такого толка
  • Scan failed at tablet 72075186224037913, reason: task_error:cannot allocate memory for step ALLOCATE_MEMORY::Fetching: 'GLOBAL::(limit:10737418240;val:10555795466;delta=247283655);', code: 2013
  • Scan failed at tablet 72075186224037920, reason: ColumnShard scanner timeout: HAS_ACK=1, code: 2013

Но!

  • в логах не вижу вообще никаких ошибок (поэтому могу предположить, что фикс 7183e7a успешный =)
  • minidumps и корок не вижу

@ivanmorozov333 предположил, что фикс сработал, ошибка ушла, но это могло привести к зависанию запросов,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants