You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As hinted to in the previous commit, I completely missed the
significance of
using IntMap = SparseArray<int>;
the first time around and just used whatever worked, but turns out
that was the entire source of my trouble: `IntMap` is just two array
lookups (and a few checks) to find an entry, not an entire siphash
hash, that is a ton cheaper.
Eventually we might want to implement a specialised version for our
exact needs (as we're still going through an entire hashing and entry
lookup machinery we don't need to), but for now get somewhat close
using an `IndexSet<NoHashHasher>`: `IndexSet` provides the ordering
and fast iteration, and NoHash provides a *nearly* free hashing, and
since the inputs are the output of aho and a few other processes it's
unlikely to be very adversarial, at least in a problematic way.
Also as noted previously, don't even bother for the counts, just use a
vec as that's the only bit that's needed. Tried to use a
stack-allocated TinyVec but that turns out to consistently be ~5%
slower, with ~7% more cycles needed (though only 2% more
instructions).
Here's the `time(1)` report after this change:
```
46.63 real 46.25 user 0.03 sys
143327232 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
8884 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
131 involuntary context switches
498048446439 instructions retired
149096223231 cycles elapsed
139855488 peak memory footprint
```
Interestingly while the total runtime is about on par with re2 (even
more so if we consider that the rust script takes ~0.1s on loading and
compiling the prefilters) rust-regex now retires 10% less instructions
and burns 1.5% less cycles than re2. So it likely still has memory
access issues, most likely from misusing `IndexSet`
Also it's improved from 3.14 IPC to 3.34, which is great, but the re2
script is at 3.67. In all honesty I don't know where that would come
from, I assume from having more expensive instructions like modulos
for the hashset.
At least it doesn't look like the branches are too much of an issue...
Also I tried looking at using `get_unchecked` for the counts in case
that did something, it does not (at least not consistently), so
probably not something I need to investigate in the future even if I
will undoubtedly forget.
0 commit comments