@@ -185,37 +185,36 @@ A regular expression program is essentially a sequence of opcodes produced by
185
185
the compiler plus various facts about the regular expression (such as whether
186
186
it is anchored, its capture names, etc.).
187
187
188
- ### The regex! macro (or why ` regex::internal ` exists)
189
-
190
- The ` regex! ` macro is defined in the ` regex_macros ` crate as a compiler plugin,
191
- which is maintained in this repository. The ` regex! ` macro compiles a regular
192
- expression at compile time into specialized Rust code.
193
-
194
- The ` regex! ` macro was written when this library was first conceived and
195
- unfortunately hasn't changed much since then. In particular, it encodes the
196
- entire Pike VM into stack allocated space (no heap allocation is done). When
197
- ` regex! ` was first written, this provided a substantial speed boost over
198
- so-called "dynamic" regexes compiled at runtime, and in particular had much
199
- lower overhead per match. This was because the only matching engine at the
200
- time was the Pike VM. The addition of other matching engines has inverted
201
- the relationship; the ` regex! ` macro is almost never faster than the dynamic
202
- variant. (In fact, it is typically substantially slower.)
203
-
204
- In order to build the ` regex! ` macro this way, it must have access to some
205
- internals of the regex library, which is in a distinct crate. (Compiler plugins
206
- must be part of a distinct crate.) Namely, it must be able to compile a regular
207
- expression and access its opcodes. The necessary internals are exported as part
208
- of the top-level ` internal ` module in the regex library, but is hidden from
209
- public documentation. In order to present a uniform API between programs build
210
- by the ` regex! ` macro and their dynamic analoges, the ` Regex ` type is an enum
211
- whose variants are hidden from public documentation.
212
-
213
- In the future, the ` regex! ` macro should probably work more like Ragel, but
214
- it's not clear how hard this is. In particular, the ` regex! ` macro should be
215
- able to support all the features of dynamic regexes, which may be hard to do
216
- with a Ragel-style implementation approach. (Which somewhat suggests that the
217
- ` regex! ` macro may also need to grow conditional execution logic like the
218
- dynamic variants, which seems rather grotesque.)
188
+ ### The regex! macro
189
+
190
+ The ` regex! ` macro no longer exists. It was developed in a bygone era as a
191
+ compiler plugin during the infancy of the regex crate. Back then, then only
192
+ matching engine in the crate was the Pike VM. The ` regex! ` macro was, itself,
193
+ also a Pike VM. The only advantages it offered over the dynamic Pike VM that
194
+ was built at runtime were the following:
195
+
196
+ 1 . Syntax checking was done at compile time. Your Rust program wouldn't
197
+ compile if your regex didn't compile.
198
+ 2 . Reduction of overhead that was proportional to the size of the regex.
199
+ For the most part, this overhead consisted of heap allocation, which
200
+ was nearly eliminated in the compiler plugin.
201
+
202
+ The main takeaway here is that the compiler plugin was a marginally faster
203
+ version of a slow regex engine. As the regex crate evolved, it grew other regex
204
+ engines (DFA, bounded backtracker) and sophisticated literal optimizations.
205
+ The regex macro didn't keep pace, and it therefore became (dramatically) slower
206
+ than the dynamic engines. The only reason left to use it was for the compile
207
+ time guarantee that your regex is correct. Fortunately, Clippy (the Rust lint
208
+ tool) has a lint that checks your regular expression validity, which mostly
209
+ replaces that use case.
210
+
211
+ Additionally, the regex compiler plugin stopped receiving maintenance. Nobody
212
+ complained. At that point, it seemed prudent to just remove it.
213
+
214
+ Will a compiler plugin be brought back? The future is murky, but there is
215
+ definitely an opportunity there to build something that is faster than the
216
+ dynamic engines in some cases. But it will be challenging! As of now, there
217
+ are no plans to work on this.
219
218
220
219
221
220
## Testing
@@ -236,7 +235,6 @@ the AT&T test suite) and code generate tests for each matching engine. The
236
235
approach we use in this library is to create a Cargo.toml entry point for each
237
236
matching engine we want to test. The entry points are:
238
237
239
- * ` tests/test_plugin.rs ` - tests the ` regex! ` macro
240
238
* ` tests/test_default.rs ` - tests ` Regex::new `
241
239
* ` tests/test_default_bytes.rs ` - tests ` bytes::Regex::new `
242
240
* ` tests/test_nfa.rs ` - tests ` Regex::new ` , forced to use the NFA
@@ -261,18 +259,14 @@ entry points, it can take a while to compile everything. To reduce compile
261
259
times slightly, try using ` cargo test --test default ` , which will only use the
262
260
` tests/test_default.rs ` entry point.
263
261
264
- N.B. To run tests for the ` regex! ` macro, use:
265
-
266
- cargo test --manifest-path regex_macros/Cargo.toml
267
-
268
262
269
263
## Benchmarking
270
264
271
265
The benchmarking in this crate is made up of many micro-benchmarks. Currently,
272
266
there are two primary sets of benchmarks: the benchmarks that were adopted
273
- at this library's inception (in ` benches /src/misc.rs` ) and a newer set of
267
+ at this library's inception (in ` bench /src/misc.rs` ) and a newer set of
274
268
benchmarks meant to test various optimizations. Specifically, the latter set
275
- contain some analysis and are in ` benches /src/sherlock.rs` . Also, the latter
269
+ contain some analysis and are in ` bench /src/sherlock.rs` . Also, the latter
276
270
set are all executed on the same lengthy input whereas the former benchmarks
277
271
are executed on strings of varying length.
278
272
@@ -284,7 +278,6 @@ separately from the main regex crate.
284
278
Benchmarking follows a similarly wonky setup as tests. There are multiple entry
285
279
points:
286
280
287
- * ` bench_rust_plugin.rs ` - benchmarks the ` regex! ` macro
288
281
* ` bench_rust.rs ` - benchmarks ` Regex::new `
289
282
* ` bench_rust_bytes.rs ` benchmarks ` bytes::Regex::new `
290
283
* ` bench_pcre.rs ` - benchmarks PCRE
@@ -299,36 +292,36 @@ library benchmarks (especially RE2).
299
292
If you're hacking on one of the matching engines and just want to see
300
293
benchmarks, then all you need to run is:
301
294
302
- $ ./run- bench rust
295
+ $ ./bench/run rust
303
296
304
297
If you want to compare your results with older benchmarks, then try:
305
298
306
- $ ./run- bench rust | tee old
299
+ $ ./bench/run rust | tee old
307
300
$ ... make it faster
308
- $ ./run- bench rust | tee new
309
- $ cargo- benchcmp old new --improvements
301
+ $ ./bench/run rust | tee new
302
+ $ cargo benchcmp old new --improvements
310
303
311
304
The ` cargo-benchcmp ` utility is available here:
312
305
https://github.com/BurntSushi/cargo-benchcmp
313
306
314
- The ` run- bench` utility can run benchmarks for PCRE and Oniguruma too. See
315
- ` ./run- bench --help ` .
307
+ The ` ./ bench/run ` utility can run benchmarks for PCRE and Oniguruma too. See
308
+ ` ./bench/ bench --help ` .
316
309
317
310
## Dev Docs
318
311
319
312
When digging your teeth into the codebase for the first time, the
320
313
crate documentation can be a great resource. By default ` rustdoc `
321
314
will strip out all documentation of private crate members in an
322
315
effort to help consumers of the crate focus on the * interface*
323
- without having to concern themselves with the * implimentation * .
316
+ without having to concern themselves with the * implementation * .
324
317
Normally this is a great thing, but if you want to start hacking
325
318
on regex internals it is not what you want. Many of the private members
326
319
of this crate are well documented with rustdoc style comments, and
327
320
it would be a shame to miss out on the opportunity that presents.
328
321
You can generate the private docs with:
329
322
330
323
```
331
- > rustdoc --crate-name docs src/lib.rs -o target/doc -L target/debug/deps --no-defaults --passes collapse-docs --passes unindent-comments
324
+ $ rustdoc --crate-name docs src/lib.rs -o target/doc -L target/debug/deps --no-defaults --passes collapse-docs --passes unindent-comments
332
325
```
333
326
334
327
Then just point your browser at ` target/doc/regex/index.html ` .
0 commit comments