1
1
/*!
2
2
Teddy is a simd accelerated multiple substring matching algorithm. The name
3
- and the core ideas in the algorithm were learned from the Hyperscan[1 ]
3
+ and the core ideas in the algorithm were learned from the [ Hyperscan][1_u ]
4
4
project.
5
5
6
6
7
7
Background
8
8
----------
9
+
9
10
The key idea of Teddy is to do *packed* substring matching. In the literature,
10
11
packed substring matching is the idea of examing multiple bytes in a haystack
11
12
at a time to detect matches. Implementations of, for example, memchr (which
@@ -15,20 +16,20 @@ extended to substring matching. The PCMPESTRI instruction (and its relatives),
15
16
for example, implements substring matching in hardware. It is, however, limited
16
17
to substrings of length 16 bytes or fewer, but this restriction is fine in a
17
18
regex engine, since we rarely care about the performance difference between
18
- searching for a 16 byte literal and a 16 + N literal--- 16 is already long
19
+ searching for a 16 byte literal and a 16 + N literal— 16 is already long
19
20
enough. The key downside of the PCMPESTRI instruction, on current (2016) CPUs
20
21
at least, is its latency and throughput. As a result, it is often faster to do
21
22
substring search with a Boyer-Moore variant and a well placed memchr to quickly
22
23
skip through the haystack.
23
24
24
25
There are fewer results from the literature on packed substring matching,
25
- and even fewer for packed multiple substring matching. Ben-Kiki et al.[2]
26
+ and even fewer for packed multiple substring matching. Ben-Kiki et al. [2]
26
27
describes use of PCMPESTRI for substring matching, but is mostly theoretical
27
- and hand-waves performance. There is other theoretical work done by Bille[3]
28
+ and hand-waves performance. There is other theoretical work done by Bille [3]
28
29
as well.
29
30
30
31
The rest of the work in the field, as far as I'm aware, is by Faro and Kulekci
31
- and is generally focused on multiple pattern search. Their first paper[4a]
32
+ and is generally focused on multiple pattern search. Their first paper [4a]
32
33
introduces the concept of a fingerprint, which is computed for every block of
33
34
N bytes in every pattern. The haystack is then scanned N bytes at a time and
34
35
a fingerprint is computed in the same way it was computed for blocks in the
@@ -44,13 +45,13 @@ presumably because of how the algorithm uses certain SIMD instructions. This
44
45
essentially makes it useless for general purpose regex matching, where a small
45
46
number of short patterns is far more likely.
46
47
47
- Faro and Kulekci published another paper[4b] that is conceptually very similar
48
+ Faro and Kulekci published another paper [4b] that is conceptually very similar
48
49
to [4a]. The key difference is that it uses the CRC32 instruction (introduced
49
50
as part of SSE 4.2) to compute fingerprint values. This also enables the
50
51
algorithm to work effectively on substrings as short at 7 bytes with 4 byte
51
52
windows. 7 bytes is unfortunately still too long. The window could be
52
53
technically shrunk to 2 bytes, thereby reducing minimum length to 3, but the
53
- small window size ends up negating most performance benefits--- and it's likely
54
+ small window size ends up negating most performance benefits— and it's likely
54
55
the common case in a general purpose regex engine.
55
56
56
57
Faro and Kulekci also published [4c] that appears to be intended as a
@@ -59,7 +60,7 @@ the high throughput/latency time of PCMPESTRI and therefore chooses other SIMD
59
60
instructions that are faster. While this approach works for short substrings,
60
61
I personally couldn't see a way to generalize it to multiple substring search.
61
62
62
- Faro and Kulekci have another paper[4d] that I haven't been able to read
63
+ Faro and Kulekci have another paper [4d] that I haven't been able to read
63
64
because it is behind a paywall.
64
65
65
66
@@ -69,8 +70,8 @@ Finally, we get to Teddy. If the above literature review is complete, then it
69
70
appears that Teddy is a novel algorithm. More than that, in my experience, it
70
71
completely blows away the competition for short substrings, which is exactly
71
72
what we want in a general purpose regex engine. Again, the algorithm appears
72
- to be developed by the authors of Hyperscan[1 ]. Hyperscan was open sourced late
73
- 2015, and no earlier history could be found. Therefore, tracking the exact
73
+ to be developed by the authors of [ Hyperscan][1_u ]. Hyperscan was open sourced
74
+ late 2015, and no earlier history could be found. Therefore, tracking the exact
74
75
provenance of the algorithm with respect to the published literature seems
75
76
difficult.
76
77
@@ -142,8 +143,8 @@ How do we perform lookup though? It turns out that SSSE3 introduced a very cool
142
143
instruction called PSHUFB. The instruction takes two SIMD vectors, `A` and `B`,
143
144
and returns a third vector `C`. All vectors are treated as 16 8-bit integers.
144
145
`C` is formed by `C[i] = A[B[i]]`. (This is a bit of a simplification, but true
145
- for the purposes of this algorithm. For full details, see Intel's Intrinsics
146
- Guide[5 ].) This essentially lets us use the values in `B` to lookup values in
146
+ for the purposes of this algorithm. For full details, see [ Intel's Intrinsics
147
+ Guide][5_u ].) This essentially lets us use the values in `B` to lookup values in
147
148
`A`.
148
149
149
150
If we could somehow cause `B` to contain our 16 byte block from the haystack,
@@ -268,15 +269,52 @@ The way to extend it is:
268
269
269
270
The implementation below is commented to fill in the nitty gritty details.
270
271
271
- [1] - https://github.com/01org/hyperscan
272
- [2a] - http://drops.dagstuhl.de/opus/volltexte/2011/3355/pdf/37.pdf
273
- [2b] - http://www.cs.haifa.ac.il/~oren/Publications/bpsm.pdf
274
- [3] - http://www.sciencedirect.com/science/article/pii/S1570866710000353
275
- [4a] - http://www.dmi.unict.it/~faro/papers/conference/faro32.pdf
276
- [4b] - https://pdfs.semanticscholar.org/fed7/ca62dc469314f3552017d0da7ebd669d4649.pdf
277
- [4c] - http://arxiv.org/pdf/1209.6449.pdf
278
- [4d] - http://www.sciencedirect.com/science/article/pii/S1570866714000471
279
- [5] - https://software.intel.com/sites/landingpage/IntrinsicsGuide
272
+ References
273
+ ----------
274
+
275
+ - **[1]** [Hyperscan on GitHub](https://github.com/01org/hyperscan),
276
+ [webpage](https://01.org/hyperscan)
277
+ - **[2a]** Ben-Kiki, O., Bille, P., Breslauer, D., Gasieniec, L., Grossi, R.,
278
+ & Weimann, O. (2011).
279
+ _Optimal packed string matching_.
280
+ In LIPIcs-Leibniz International Proceedings in Informatics (Vol. 13).
281
+ Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
282
+ DOI: 10.4230/LIPIcs.FSTTCS.2011.423.
283
+ [PDF](http://drops.dagstuhl.de/opus/volltexte/2011/3355/pdf/37.pdf).
284
+ - **[2b]** Ben-Kiki, O., Bille, P., Breslauer, D., Ga̧sieniec, L., Grossi, R.,
285
+ & Weimann, O. (2014).
286
+ _Towards optimal packed string matching_.
287
+ Theoretical Computer Science, 525, 111-129.
288
+ DOI: 10.1016/j.tcs.2013.06.013.
289
+ [PDF](http://www.cs.haifa.ac.il/~oren/Publications/bpsm.pdf).
290
+ - **[3]** Bille, P. (2011).
291
+ _Fast searching in packed strings_.
292
+ Journal of Discrete Algorithms, 9(1), 49-56.
293
+ DOI: 10.1016/j.jda.2010.09.003.
294
+ [PDF](http://www.sciencedirect.com/science/article/pii/S1570866710000353).
295
+ - **[4a]** Faro, S., & Külekci, M. O. (2012, October).
296
+ _Fast multiple string matching using streaming SIMD extensions technology_.
297
+ In String Processing and Information Retrieval (pp. 217-228).
298
+ Springer Berlin Heidelberg.
299
+ DOI: 10.1007/978-3-642-34109-0_23.
300
+ [PDF](http://www.dmi.unict.it/~faro/papers/conference/faro32.pdf).
301
+ - **[4b]** Faro, S., & Külekci, M. O. (2013, September).
302
+ _Towards a Very Fast Multiple String Matching Algorithm for Short Patterns_.
303
+ In Stringology (pp. 78-91).
304
+ [PDF](http://www.dmi.unict.it/~faro/papers/conference/faro36.pdf).
305
+ - **[4c]** Faro, S., & Külekci, M. O. (2013, January).
306
+ _Fast packed string matching for short patterns_.
307
+ In Proceedings of the Meeting on Algorithm Engineering & Expermiments
308
+ (pp. 113-121).
309
+ Society for Industrial and Applied Mathematics.
310
+ [PDF](http://arxiv.org/pdf/1209.6449.pdf).
311
+ - **[4d]** Faro, S., & Külekci, M. O. (2014).
312
+ _Fast and flexible packed string matching_.
313
+ Journal of Discrete Algorithms, 28, 61-72.
314
+ DOI: 10.1016/j.jda.2014.07.003.
315
+
316
+ [1_u]: https://github.com/01org/hyperscan
317
+ [5_u]: https://software.intel.com/sites/landingpage/IntrinsicsGuide
280
318
*/
281
319
282
320
// TODO: Extend this to use AVX2 instructions.
0 commit comments