Skip to content

tpch10000 q18. WideCombiner single bucket don't fit into memory #11416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lll-phill-lll opened this issue Nov 8, 2024 · 4 comments · May be fixed by #11471
Closed

tpch10000 q18. WideCombiner single bucket don't fit into memory #11416

lll-phill-lll opened this issue Nov 8, 2024 · 4 comments · May be fixed by #11471
Assignees
Labels
area/runtime YDB runtime issues

Comments

@lll-phill-lll
Copy link
Member

lll-phill-lll commented Nov 8, 2024

bool isNew = SpilledBuckets.front().InMemoryProcessingState->TasteIt();

Bt:
https://paste.yandex-team.ru/d7a045b0-8309-4f2f-bc6d-56c514da5928

@vladl2802
Copy link
Collaborator

vladl2802 commented Nov 13, 2024

What we tried for now (in #11471):

  1. When extracting data from combiner first process in-memory buckets and only after those process spilled buckets. But this change most likely don't have any impact because all buckets will be spilled before extracting.
  2. Add extra hashing function on top of existing Hasher. Motivation: without this there are only 1/4 unique hash values on q18 (Hasher for integers returns just its value as hash and for input data %32 < 8 is true. So for ex if we want to split values in 128 buckets with hash % 128 we will get only 1/4 non-empty buckets). But with extra hash function values are distributed evenly between buckets

As a result execution time of q18 on scale 100 with 1 task (pragma ydb.MaxTasksPerStage="1") jump from average 297s to 357s. But judging by flame graphs (placed lower), additional hash function does not have any huge impact on performance (0.08% for both last combiners). So my guess is that performance drop is caused by more smaller buckets to process then before.

According to @lll-phill-lll, on 10k scale this fixes memory limit exception that was firing and query got timeouted after 1 hour of execution.

Without hashing

mem-without-hashing

With hashing

mem

@vladl2802
Copy link
Collaborator

vladl2802 commented Nov 14, 2024

For point (2) I've tried xxh64 (those flame graphs are in previous comment and below also) and fibonacci hashing (its flame graph is below).

So for some reason (that I can't explain for now) xxh64 seems faster than fibonacci even so fibonacci should require less operations to compute. But I am not sure about that point, because execution time is really unstable.

So we will use xxh64 for now. Further progress can be made in block combine or/and after #11591

Fibonacci hashing (as here)

mem-fibonacci

xxh64

mem-xxh

@lll-phill-lll
Copy link
Member Author

2h run results:
https://nda.ya.ru/t/qWfckDan79ezxP
spilling plot: https://nda.ya.ru/t/wp-yyyx779ezvp

Looks suspicious. Like it worked only for 30 mins

@lll-phill-lll
Copy link
Member Author

https://nda.ya.ru/t/dlVeCoho79f4uA
Uploading a08810f4946fbb1c.svg…

Error after 32 mins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime YDB runtime issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants