Skip to content

Scan failed: count(*) #11206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vlad-gogov opened this issue Nov 2, 2024 · 12 comments · Fixed by #15323 or #15520
Closed

Scan failed: count(*) #11206

vlad-gogov opened this issue Nov 2, 2024 · 12 comments · Fixed by #15323 or #15520

Comments

@vlad-gogov
Copy link

vlad-gogov commented Nov 2, 2024

Cluster: OLAP TESTING VLA COMMON3
Version: ydb-stable-24-3-12
Logs: https://paste.yandex-team.ru/0bfde892-430c-4b86-84b4-11feeb0491ff
Query: SELECT COUNT(*) FROM `raw/kikimr_ydb_kikimr-log`;
Error: Scan failed at tablet 72075186224042551, reason: task_error:cannot read blob range { Blob: DS:2181038094:[72075186224042551:3:7044832:2:64:4488:0] Offset: 3576 Size: 912 }

@vlad-gogov
Copy link
Author

Cluster: OLAP TESTING VLA COMMON3
Version: stable-24-3.c7e80d2
Logs: https://paste.yandex-team.ru/40ca9dc7-c272-4234-b9f4-ee8d838a86c7
Query: SELECT COUNT(*) FROM raw/kikimr_ydb_kikimr-log;
Error: Scan failed at tablet 72075186224042540, reason: task_error:cannot read blob range { Blob: DS:2181038110:[72075186224042540:2558:19314144:2:1:17464:0] Offset: 12928 Size: 344 }

@vlad-gogov vlad-gogov self-assigned this Dec 11, 2024
@vlad-gogov vlad-gogov changed the title Scan failed after count(*) Scan failed: count(*) Dec 11, 2024
@naspirato
Copy link
Collaborator

naspirato commented Feb 11, 2025

Поставил высокий приоритет т.к

  1. это очень частая проблема 24-3-hotfix-15
  2. встречал проблему в main ~ 1.02.25
  3. перезапуск запроса не чинит проблему

@zverevgeny
Copy link
Collaborator

@naspirato опиши, плиз, как ты это вопроизвёл на main

@Hor911
Copy link
Collaborator

Hor911 commented Feb 20, 2025

@zverevgeny it seems that this error is more likely to occur under high load with CPU 100% consumption

@naspirato naspirato marked this as a duplicate of #11858 Feb 20, 2025
@naspirato
Copy link
Collaborator

naspirato commented Feb 20, 2025

last occured in
последний раз ловилась ошибка 10.02 в этом пр на прогоне asan, на тесте ydb/tests/olap/scenario/test_alter_tiering.py.TestAlterTiering.test[many_tables] отчет

to repeat try
./ya make -ttt --build "release" --sanitize="address" -DDEBUGINFO_LINES_ONLY -F 'test_alter_tiering.py::TestAlterTiering::test[many_tables]' ydb/tests/olap/scenario

@dorooleg
Copy link
Collaborator

Аналогичная проблема стрельнула на yaem базе:

ydb -e grpcs://lb.cc8ltdnbj4lhkbct494a.ydb.mdb.cloud-preprod.yandex.net:2135  -d /pre-prod_global/yc.yaem.service-cloud/cc8ltdnbj4lhkbct494a sql -s 'SELECT `class`, COUNT(`class`) AS `count` FROM `skladnica/events` GROUP BY class ORDER BY count DESC'
Status: BAD_REQUEST
Issues: 
<main>: Error: Table /pre-prod_global/yc.yaem.service-cloud/cc8ltdnbj4lhkbct494a/skladnica/events (shard 72075186293513193) scan failed, reason: cannot build metadata/Snapshot too old: {1739986234280:max}. CS min read snapshot: {1739986256000:max}. now: 2025-02-19T17:35:56.124479Z, code: 2017

@dorooleg dorooleg assigned dorooleg and unassigned vlad-gogov Feb 26, 2025
@maximyurchuk
Copy link
Collaborator

maximyurchuk commented Mar 3, 2025

Аналогичная проблема стрельнула в long living кластере при workload bulk_upsert + select нагрузке

Версия: main.4b70624

Scan failed at tablet 72075186224040148, reason: task_error:cannot read blob range { Blob: DS:2181038092:[72075186224040148:137:1207559:9:11:5239304:0] Offset: 5197880 Size: 40256 }

Image

@maximyurchuk
Copy link
Collaborator

На встрече olap tests возникло подозрение, что сообщение не очень корректное. возможно это на самом деле таймаут чтение блоба из bs.

@zverevgeny
Copy link
Collaborator

Нужно посмотреть, как DS обрабатывает таймауты + откуда конкретно прилетает таймаут
@dorooleg
Опция: не ставить таймаут/ретраить

@dorooleg
Copy link
Collaborator

  • Deadline захардкожен в коде
  • При отправке сообщения в BS proxy проставляется тут
  • Для Scan запросов 10s, для background процессов 30s или inf.

@dorooleg dorooleg linked a pull request Mar 10, 2025 that will close this issue
@dorooleg
Copy link
Collaborator

Сделал ревью для избавления от timeout в этом месте: #15520

@dorooleg
Copy link
Collaborator

Воспроизвел ошибку на долгоживущем стенде, действительно это deadline. #15520 должно помогать

error
Scan failed at tablet 72075186224040184, reason: task_error:Error reading blob range for columns: { Blob: DS:2181038083:[72075186224040184:3863:243:53:19:5146248:0] Offset: 5105448 Size: 39648 }, error: cannot get blob: , status: DEADLINE
error
Scan failed at tablet 72075186224040184, reason: task_error:Error reading blob range for columns: { Blob: DS:2181038083:[72075186224040184:3863:243:53:19:5146248:0] Offset: 5105448 Size: 39648 }, error: cannot get blob: , status: DEADLINE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants