Skip to content

Commit a1cda64

Browse files
committed
More Detailed References
1 parent 5a64040 commit a1cda64

File tree

1 file changed

+59
-21
lines changed

1 file changed

+59
-21
lines changed

src/simd_accel/teddy128.rs

+59-21
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
/*!
22
Teddy is a simd accelerated multiple substring matching algorithm. The name
3-
and the core ideas in the algorithm were learned from the Hyperscan[1]
3+
and the core ideas in the algorithm were learned from the [Hyperscan][1_u]
44
project.
55
66
77
Background
88
----------
9+
910
The key idea of Teddy is to do *packed* substring matching. In the literature,
1011
packed substring matching is the idea of examing multiple bytes in a haystack
1112
at a time to detect matches. Implementations of, for example, memchr (which
@@ -15,20 +16,20 @@ extended to substring matching. The PCMPESTRI instruction (and its relatives),
1516
for example, implements substring matching in hardware. It is, however, limited
1617
to substrings of length 16 bytes or fewer, but this restriction is fine in a
1718
regex engine, since we rarely care about the performance difference between
18-
searching for a 16 byte literal and a 16 + N literal---16 is already long
19+
searching for a 16 byte literal and a 16 + N literal16 is already long
1920
enough. The key downside of the PCMPESTRI instruction, on current (2016) CPUs
2021
at least, is its latency and throughput. As a result, it is often faster to do
2122
substring search with a Boyer-Moore variant and a well placed memchr to quickly
2223
skip through the haystack.
2324
2425
There are fewer results from the literature on packed substring matching,
25-
and even fewer for packed multiple substring matching. Ben-Kiki et al.[2]
26+
and even fewer for packed multiple substring matching. Ben-Kiki et al. [2]
2627
describes use of PCMPESTRI for substring matching, but is mostly theoretical
27-
and hand-waves performance. There is other theoretical work done by Bille[3]
28+
and hand-waves performance. There is other theoretical work done by Bille [3]
2829
as well.
2930
3031
The rest of the work in the field, as far as I'm aware, is by Faro and Kulekci
31-
and is generally focused on multiple pattern search. Their first paper[4a]
32+
and is generally focused on multiple pattern search. Their first paper [4a]
3233
introduces the concept of a fingerprint, which is computed for every block of
3334
N bytes in every pattern. The haystack is then scanned N bytes at a time and
3435
a fingerprint is computed in the same way it was computed for blocks in the
@@ -44,13 +45,13 @@ presumably because of how the algorithm uses certain SIMD instructions. This
4445
essentially makes it useless for general purpose regex matching, where a small
4546
number of short patterns is far more likely.
4647
47-
Faro and Kulekci published another paper[4b] that is conceptually very similar
48+
Faro and Kulekci published another paper [4b] that is conceptually very similar
4849
to [4a]. The key difference is that it uses the CRC32 instruction (introduced
4950
as part of SSE 4.2) to compute fingerprint values. This also enables the
5051
algorithm to work effectively on substrings as short at 7 bytes with 4 byte
5152
windows. 7 bytes is unfortunately still too long. The window could be
5253
technically shrunk to 2 bytes, thereby reducing minimum length to 3, but the
53-
small window size ends up negating most performance benefits---and it's likely
54+
small window size ends up negating most performance benefitsand it's likely
5455
the common case in a general purpose regex engine.
5556
5657
Faro and Kulekci also published [4c] that appears to be intended as a
@@ -59,7 +60,7 @@ the high throughput/latency time of PCMPESTRI and therefore chooses other SIMD
5960
instructions that are faster. While this approach works for short substrings,
6061
I personally couldn't see a way to generalize it to multiple substring search.
6162
62-
Faro and Kulekci have another paper[4d] that I haven't been able to read
63+
Faro and Kulekci have another paper [4d] that I haven't been able to read
6364
because it is behind a paywall.
6465
6566
@@ -69,8 +70,8 @@ Finally, we get to Teddy. If the above literature review is complete, then it
6970
appears that Teddy is a novel algorithm. More than that, in my experience, it
7071
completely blows away the competition for short substrings, which is exactly
7172
what we want in a general purpose regex engine. Again, the algorithm appears
72-
to be developed by the authors of Hyperscan[1]. Hyperscan was open sourced late
73-
2015, and no earlier history could be found. Therefore, tracking the exact
73+
to be developed by the authors of [Hyperscan][1_u]. Hyperscan was open sourced
74+
late 2015, and no earlier history could be found. Therefore, tracking the exact
7475
provenance of the algorithm with respect to the published literature seems
7576
difficult.
7677
@@ -142,8 +143,8 @@ How do we perform lookup though? It turns out that SSSE3 introduced a very cool
142143
instruction called PSHUFB. The instruction takes two SIMD vectors, `A` and `B`,
143144
and returns a third vector `C`. All vectors are treated as 16 8-bit integers.
144145
`C` is formed by `C[i] = A[B[i]]`. (This is a bit of a simplification, but true
145-
for the purposes of this algorithm. For full details, see Intel's Intrinsics
146-
Guide[5].) This essentially lets us use the values in `B` to lookup values in
146+
for the purposes of this algorithm. For full details, see [Intel's Intrinsics
147+
Guide][5_u].) This essentially lets us use the values in `B` to lookup values in
147148
`A`.
148149
149150
If we could somehow cause `B` to contain our 16 byte block from the haystack,
@@ -268,15 +269,52 @@ The way to extend it is:
268269
269270
The implementation below is commented to fill in the nitty gritty details.
270271
271-
[1] - https://github.com/01org/hyperscan
272-
[2a] - http://drops.dagstuhl.de/opus/volltexte/2011/3355/pdf/37.pdf
273-
[2b] - http://www.cs.haifa.ac.il/~oren/Publications/bpsm.pdf
274-
[3] - http://www.sciencedirect.com/science/article/pii/S1570866710000353
275-
[4a] - http://www.dmi.unict.it/~faro/papers/conference/faro32.pdf
276-
[4b] - https://pdfs.semanticscholar.org/fed7/ca62dc469314f3552017d0da7ebd669d4649.pdf
277-
[4c] - http://arxiv.org/pdf/1209.6449.pdf
278-
[4d] - http://www.sciencedirect.com/science/article/pii/S1570866714000471
279-
[5] - https://software.intel.com/sites/landingpage/IntrinsicsGuide
272+
References
273+
----------
274+
275+
- **[1]** [Hyperscan on GitHub](https://github.com/01org/hyperscan),
276+
[webpage](https://01.org/hyperscan)
277+
- **[2a]** Ben-Kiki, O., Bille, P., Breslauer, D., Gasieniec, L., Grossi, R.,
278+
& Weimann, O. (2011).
279+
_Optimal packed string matching_.
280+
In LIPIcs-Leibniz International Proceedings in Informatics (Vol. 13).
281+
Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
282+
DOI: 10.4230/LIPIcs.FSTTCS.2011.423.
283+
[PDF](http://drops.dagstuhl.de/opus/volltexte/2011/3355/pdf/37.pdf).
284+
- **[2b]** Ben-Kiki, O., Bille, P., Breslauer, D., Ga̧sieniec, L., Grossi, R.,
285+
& Weimann, O. (2014).
286+
_Towards optimal packed string matching_.
287+
Theoretical Computer Science, 525, 111-129.
288+
DOI: 10.1016/j.tcs.2013.06.013.
289+
[PDF](http://www.cs.haifa.ac.il/~oren/Publications/bpsm.pdf).
290+
- **[3]** Bille, P. (2011).
291+
_Fast searching in packed strings_.
292+
Journal of Discrete Algorithms, 9(1), 49-56.
293+
DOI: 10.1016/j.jda.2010.09.003.
294+
[PDF](http://www.sciencedirect.com/science/article/pii/S1570866710000353).
295+
- **[4a]** Faro, S., & Külekci, M. O. (2012, October).
296+
_Fast multiple string matching using streaming SIMD extensions technology_.
297+
In String Processing and Information Retrieval (pp. 217-228).
298+
Springer Berlin Heidelberg.
299+
DOI: 10.1007/978-3-642-34109-0_23.
300+
[PDF](http://www.dmi.unict.it/~faro/papers/conference/faro32.pdf).
301+
- **[4b]** Faro, S., & Külekci, M. O. (2013, September).
302+
_Towards a Very Fast Multiple String Matching Algorithm for Short Patterns_.
303+
In Stringology (pp. 78-91).
304+
[PDF](http://www.dmi.unict.it/~faro/papers/conference/faro36.pdf).
305+
- **[4c]** Faro, S., & Külekci, M. O. (2013, January).
306+
_Fast packed string matching for short patterns_.
307+
In Proceedings of the Meeting on Algorithm Engineering & Expermiments
308+
(pp. 113-121).
309+
Society for Industrial and Applied Mathematics.
310+
[PDF](http://arxiv.org/pdf/1209.6449.pdf).
311+
- **[4d]** Faro, S., & Külekci, M. O. (2014).
312+
_Fast and flexible packed string matching_.
313+
Journal of Discrete Algorithms, 28, 61-72.
314+
DOI: 10.1016/j.jda.2014.07.003.
315+
316+
[1_u]: https://github.com/01org/hyperscan
317+
[5_u]: https://software.intel.com/sites/landingpage/IntrinsicsGuide
280318
*/
281319

282320
// TODO: Extend this to use AVX2 instructions.

0 commit comments

Comments
 (0)