Skip to content

Unexpected "Mkql memory limit exceeded" problem after the query failure (tpcds q14 & q24 problems) #6832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gridnevvvit opened this issue Jul 18, 2024 · 1 comment · Fixed by #8356
Assignees

Comments

@gridnevvvit
Copy link
Member

As I see from the benchmark results, there is a problem when after the "Mkql memory limit exceeded" error the next query can fail with a different limit or even without extra memory allocations. Need an investigation on this and fix for sure.

Failed:

: Error: Mkql memory limit exceeded, limit: 5358747648, host: d-c0f9g9voc9o4o1m4sj49-ydb-testing-sas-0-14, canAllocateExtraMemory: 1, memory manager details: TxResourcesInfo{ Memory initially granted resources: 2097152, extra allocations 10718543872, execution units: 0, started at: 2024-07-14T13:31:06.488128Z }, code: 2029

запрос q13 падает вот так

execution units = 0 эт ладно, но оно пишет

extra allocations 10718543872 - это 10 гб

q14

Failed:

: Error: Mkql memory limit exceeded, limit: 31457280, host: d-c0f9g9voc9o4o1m4sj49-ydb-testing-sas-0-16, canAllocateExtraMemory: 1, memory manager details: TxResourcesInfo{ Memory initially granted resources: 13107200, extra allocations 125829120, execution units: 0, started at: 2024-07-14T13:32:39.948055Z }, code: 2029

@gridnevvvit gridnevvvit changed the title Unexpected "Mkql memory limit exceeded" problem after the query failure Unexpected "Mkql memory limit exceeded" problem after the query failure (tpcds q14 & q24 problems) Jul 18, 2024
@zverevgeny zverevgeny added this to the Pass TPC-DS 1TB milestone Jul 18, 2024
@abyss7
Copy link
Collaborator

abyss7 commented Jul 22, 2024

Part of the log of the failing request:

kikimr.log.22.gz:Jul 21 12:59:25 ydb-sas-testing-0010 kikimr_31003[237191]: 2024-07-21T09:59:25.872918Z :KQP_COMPUTE WARN: TxId: 281474978630662, task: 9. [Mem] memory 31457280 NOT granted
kikimr.log.22.gz:Jul 21 12:59:25 ydb-sas-testing-0010 kikimr_31003[237191]: 2024-07-21T09:59:25.872910Z :KQP_COMPUTE WARN: fline=kqp_compute_actor_factory.cpp:42;problem=cannot_allocate_memory;tx_id=281474978630662;task_id=9;memory=31457280;
kikimr.log.22.gz:Jul 21 12:59:25 ydb-sas-testing-0010 kikimr_31003[237191]: 2024-07-21T09:59:25.873056Z :KQP_COMPUTE ERROR: SelfId: [50003:7394026568098931002:7141], TxId: 281474978630662, task: 9. Ctx: { SessionId : ydb://session/3?node_id=50003&id=MmY5Y2I4MjktN2ExOTU2NWQtY2RhMDViOS00MGQzZGY3Ng==.  CustomerSuppliedId : .  TraceId : 01j3abpmm4fqhhg04mn5mq0rx4.  CurrentExecutionId : .  Database : /Root/db1.  PoolId : . }. InternalError: OVERLOADED KIKIMR_PRECONDITION_FAILED: { <main>: Error: Mkql memory limit exceeded, limit: 524288, host: ydb-sas-testing-0010.search.yandex.net, canAllocateExtraMemory: 1, memory manager details: TxResourcesInfo{ Memory initially granted resources: 5242880, extra allocations 0, execution units: 0, started at: 2024-07-21T09:59:25.872097Z }, code: 2029 }.
kikimr.log.22.gz:Jul 21 12:59:25 ydb-sas-testing-0010 kikimr_31003[237191]: 2024-07-21T09:59:25.873090Z :KQP_COMPUTE WARN: TxId: 281474978630662, task: 11. [Mem] memory 31457280 NOT granted
kikimr.log.22.gz:Jul 21 12:59:25 ydb-sas-testing-0010 kikimr_31003[237191]: 2024-07-21T09:59:25.873102Z :KQP_COMPUTE WARN: fline=kqp_compute_actor_factory.cpp:42;problem=cannot_allocate_memory;tx_id=281474978630662;task_id=12;memory=31457280;
kikimr.log.22.gz:Jul 21 12:59:25 ydb-sas-testing-0010 kikimr_31003[237191]: 2024-07-21T09:59:25.873106Z :KQP_COMPUTE WARN: TxId: 281474978630662, task: 12. [Mem] memory 31457280 NOT granted
kikimr.log.22.gz:Jul 21 12:59:25 ydb-sas-testing-0010 kikimr_31003[237191]: 2024-07-21T09:59:25.873087Z :KQP_COMPUTE WARN: fline=kqp_compute_actor_factory.cpp:42;problem=cannot_allocate_memory;tx_id=281474978630662;task_id=11;memory=31457280;
kikimr.log.22.gz:Jul 21 12:59:25 ydb-sas-testing-0010 kikimr_31003[237191]: 2024-07-21T09:59:25.873197Z :KQP_COMPUTE ERROR: SelfId: [50003:7394026568098931008:7167], TxId: 281474978630662, task: 12. Ctx: { TraceId : 01j3abpmm4fqhhg04mn5mq0rx4.  SessionId : ydb://session/3?node_id=50003&id=MmY5Y2I4MjktN2ExOTU2NWQtY2RhMDViOS00MGQzZGY3Ng==.  CustomerSuppliedId : .  CurrentExecutionId : .  PoolId : .  Database : /Root/db1. }. InternalError: OVERLOADED KIKIMR_PRECONDITION_FAILED: { <main>: Error: Mkql memory limit exceeded, limit: 524288, host: ydb-sas-testing-0010.search.yandex.net, canAllocateExtraMemory: 1, memory manager details: TxResourcesInfo{ Memory initially granted resources: 6291456, extra allocations 0, execution units: 0, started at: 2024-07-21T09:59:25.872097Z }, code: 2029 }.

The top-level error message looks like:

<main>: Error: Mkql memory limit exceeded, limit: 524288, host: ydb-sas-testing-0010.search.yandex.net, canAllocateExtraMemory: 1, memory manager details: TxResourcesInfo{ Memory initially granted resources: 5242880, extra allocations 0, execution units: 0, started at: 2024-07-21T09:59:25.872097Z }, code: 2029

@abyss7 abyss7 reopened this Aug 29, 2024
@abyss7 abyss7 linked a pull request Aug 29, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants