Skip to content

get rid of std::hash #11591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lll-phill-lll opened this issue Nov 14, 2024 · 2 comments
Open

get rid of std::hash #11591

lll-phill-lll opened this issue Nov 14, 2024 · 2 comments
Assignees
Labels
area/runtime YDB runtime issues

Comments

@lll-phill-lll
Copy link
Member

lll-phill-lll commented Nov 14, 2024

As was noted by @vladl2802 in #11416 we have non even distribution of values between buckets while spilling.

The root couse of it is that we rely on a hash function here:

auto bucketId = hash % SpilledBucketCount;

which appears to be std::hash which just returns the value itself: https://godbolt.org/z/es8dxMGeY

Hash function is set here:

return std::hash<T>()(value.Get<T>());

As a temp measure we change the algorithm of bucket selection from hash%128 to XXHASH(hash)%128. pr: #11471

Also, with std::hash we can face compatibility issues while changing MKQL_RUNTIME version.

So, the proposal of this task is to change std::hash to some other hash function. Hash functions to consider:
rh hash:

ui64 bucket = ((SelfHash ^ hash) * 11400714819323198485llu) >> capacityShift;

xxhash: https://github.com/Cyan4973/xxHash. We already use xxhash in GraceJoin:
XXH64_hash_t hash = XXH64(TempTuple.data() + NullsBitmapSize_, (TempTuple.size() - NullsBitmapSize_) * sizeof(ui64), 0);

@lll-phill-lll lll-phill-lll self-assigned this Nov 14, 2024
@lll-phill-lll lll-phill-lll added the area/runtime YDB runtime issues label Nov 14, 2024
@lll-phill-lll
Copy link
Member Author

lll-phill-lll commented Nov 14, 2024

Same problem we had in the hash shuffle algorithm for channels. We were trying to fix it here: #4364. But had to revert it because of the compatibility issues

@yumkam
Copy link
Collaborator

yumkam commented Nov 14, 2024

XXH3_64Bits (from same xxh library) claimed to provide better performance for small-inputs (I have not benchmarked it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime YDB runtime issues
Projects
None yet
Development

No branches or pull requests

2 participants