You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
plan: compute all inner joins in memory if they fit
Fixessrc-d#577
Because we do not have a way to estimate the cost of each side of
a join, it is really difficult to know when we can compute one in
memory. But not doing so, causes inner joins to be painfully slow,
as one of the branches is iterated multiple times.
This PR addresses this by ensuring that if the right branch of the
inner join fits in memory, it will be computed in memory even if
the in-memory mode has not been activated by the user.
An user can set the maximum threshold of memory the gitbase server
can use before considering the joins should not be performed in
memory using the `MAX_MEMORY_INNER_JOIN` environment variable or
the `max_memory_joins` session variable specifying the number of
bytes. The default value for this is the half of the available
physical memory on the operating system.
Because previously we had two iterators: `innerJoinIter` and
`innerJoinMemoryIter`, and now `innerJoinIter` must be able to do
the join in memory, `innerJoinMemoryIter` has been removed and
`innerJoinIter` replaced with a version that can work with three
modes:
- `unknownMode` we don't know yet how to perform the join, so keep
iterating until we can find out. By the end of the first full pass
over the right branch `unknownMode` will either switch to
`multipassMode` or `memoryMode`.
- `memoryMode` which computes the rest of the join in memory. The
iterator can have this mode before starting iterating if the user
activated the in memory join via session or environment vars, in
which case it will load all the right side on memory before doing
any further iteration. Instead, if the iterator started in
`unknownMode` and switched to this mode, it's guaranteed to already
have loaded all the right side. From that point on, they work in
exactly the same way.
- `multipassMode`, which was the previous default mode. Iterate the
right side of the join for each row in the left side. More expensive,
but less memory consuming. The iterator can not start in this mode,
and can only be switched to it from `unknownMode` in case the
memory used by the gitbase server exceeds the maximum amount of memory
either set by the user or by default.
Signed-off-by: Miguel Molina <[email protected]>
0 commit comments