You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Same problem we had in the hash shuffle algorithm for channels. We were trying to fix it here: #4364. But had to revert it because of the compatibility issues
Uh oh!
There was an error while loading. Please reload this page.
As was noted by @vladl2802 in #11416 we have non even distribution of values between buckets while spilling.
The root couse of it is that we rely on a hash function here:
ydb/yql/essentials/minikql/comp_nodes/mkql_wide_combine.cpp
Line 489 in 6dac5e0
which appears to be std::hash which just returns the value itself: https://godbolt.org/z/es8dxMGeY
Hash function is set here:
ydb/yql/essentials/public/udf/udf_type_ops.h
Line 44 in c8e6180
As a temp measure we change the algorithm of bucket selection from
hash%128
toXXHASH(hash)%128
. pr: #11471Also, with std::hash we can face compatibility issues while changing MKQL_RUNTIME version.
So, the proposal of this task is to change std::hash to some other hash function. Hash functions to consider:
rh hash:
ydb/yql/essentials/minikql/comp_nodes/mkql_rh_hash.h
Line 219 in c8e6180
xxhash: https://github.com/Cyan4973/xxHash. We already use xxhash in GraceJoin:
ydb/yql/essentials/minikql/comp_nodes/mkql_grace_join_imp.cpp
Line 78 in c8e6180
The text was updated successfully, but these errors were encountered: