Skip to content

Commit 5ca056d

Browse files
committed
Auto merge of #436 - rust-lang:ag/misc-fixes, r=BurntSushi
remove regex plugin + rollup + chores This PR: * Removes the regex compiler plugin. It's been broken for quite some time and nobody has seemed to notice. It's time for it to go. See commit cc7b00c for details. * Setup a Cargo workspace for this repo. * Update deps in various places. This includes updating simd to `0.2.1`, which fixes a build failure on Rust nightly. * Name the frequency analysis based memchr search "freqy packed." * Rolls up the other open PRs #401, #410 and #433.
2 parents 83c0b2f + 5ea594e commit 5ca056d

34 files changed

+366
-2179
lines changed

.travis.yml

+4
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,7 @@ env:
2626
notifications:
2727
email:
2828
on_success: never
29+
branches:
30+
only:
31+
- master
32+
- auto

Cargo.toml

+13-4
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ authors = ["The Rust Project Developers"]
55
license = "MIT/Apache-2.0"
66
readme = "README.md"
77
repository = "https://github.com/rust-lang/regex"
8-
documentation = "https://doc.rust-lang.org/regex"
8+
documentation = "https://docs.rs/regex"
99
homepage = "https://github.com/rust-lang/regex"
1010
description = """
1111
An implementation of regular expressions for Rust. This implementation uses
@@ -17,6 +17,9 @@ categories = ["text-processing"]
1717
travis-ci = { repository = "rust-lang/regex" }
1818
appveyor = { repository = "rust-lang-libs/regex" }
1919

20+
[workspace]
21+
members = ["bench", "regex-capi", "regex-debug", "regex-syntax"]
22+
2023
[dependencies]
2124
# For very fast prefix literal matching.
2225
aho-corasick = "0.6.0"
@@ -27,17 +30,17 @@ thread_local = "0.3.2"
2730
# For parsing regular expressions.
2831
regex-syntax = { path = "regex-syntax", version = "0.4.1" }
2932
# For accelerating text search.
30-
simd = { version = "0.1.1", optional = true }
33+
simd = { version = "0.2.1", optional = true }
3134
# For compiling UTF-8 decoding into automata.
3235
utf8-ranges = "1.0.0"
3336

3437
[dev-dependencies]
3538
# For examples.
3639
lazy_static = "1"
3740
# For property based tests.
38-
quickcheck = { version = "0.5", default-features = false }
41+
quickcheck = { version = "0.6", default-features = false }
3942
# For generating random test data.
40-
rand = "0.3.15"
43+
rand = "0.4"
4144

4245
[features]
4346
# Enable to use the unstable pattern traits defined in std.
@@ -94,5 +97,11 @@ name = "backtrack-utf8bytes"
9497
path = "tests/test_backtrack_bytes.rs"
9598
name = "backtrack-bytes"
9699

100+
[profile.release]
101+
debug = true
102+
103+
[profile.bench]
104+
debug = true
105+
97106
[profile.test]
98107
debug = true

HACKING.md

+40-47
Original file line numberDiff line numberDiff line change
@@ -185,37 +185,36 @@ A regular expression program is essentially a sequence of opcodes produced by
185185
the compiler plus various facts about the regular expression (such as whether
186186
it is anchored, its capture names, etc.).
187187

188-
### The regex! macro (or why `regex::internal` exists)
189-
190-
The `regex!` macro is defined in the `regex_macros` crate as a compiler plugin,
191-
which is maintained in this repository. The `regex!` macro compiles a regular
192-
expression at compile time into specialized Rust code.
193-
194-
The `regex!` macro was written when this library was first conceived and
195-
unfortunately hasn't changed much since then. In particular, it encodes the
196-
entire Pike VM into stack allocated space (no heap allocation is done). When
197-
`regex!` was first written, this provided a substantial speed boost over
198-
so-called "dynamic" regexes compiled at runtime, and in particular had much
199-
lower overhead per match. This was because the only matching engine at the
200-
time was the Pike VM. The addition of other matching engines has inverted
201-
the relationship; the `regex!` macro is almost never faster than the dynamic
202-
variant. (In fact, it is typically substantially slower.)
203-
204-
In order to build the `regex!` macro this way, it must have access to some
205-
internals of the regex library, which is in a distinct crate. (Compiler plugins
206-
must be part of a distinct crate.) Namely, it must be able to compile a regular
207-
expression and access its opcodes. The necessary internals are exported as part
208-
of the top-level `internal` module in the regex library, but is hidden from
209-
public documentation. In order to present a uniform API between programs build
210-
by the `regex!` macro and their dynamic analoges, the `Regex` type is an enum
211-
whose variants are hidden from public documentation.
212-
213-
In the future, the `regex!` macro should probably work more like Ragel, but
214-
it's not clear how hard this is. In particular, the `regex!` macro should be
215-
able to support all the features of dynamic regexes, which may be hard to do
216-
with a Ragel-style implementation approach. (Which somewhat suggests that the
217-
`regex!` macro may also need to grow conditional execution logic like the
218-
dynamic variants, which seems rather grotesque.)
188+
### The regex! macro
189+
190+
The `regex!` macro no longer exists. It was developed in a bygone era as a
191+
compiler plugin during the infancy of the regex crate. Back then, then only
192+
matching engine in the crate was the Pike VM. The `regex!` macro was, itself,
193+
also a Pike VM. The only advantages it offered over the dynamic Pike VM that
194+
was built at runtime were the following:
195+
196+
1. Syntax checking was done at compile time. Your Rust program wouldn't
197+
compile if your regex didn't compile.
198+
2. Reduction of overhead that was proportional to the size of the regex.
199+
For the most part, this overhead consisted of heap allocation, which
200+
was nearly eliminated in the compiler plugin.
201+
202+
The main takeaway here is that the compiler plugin was a marginally faster
203+
version of a slow regex engine. As the regex crate evolved, it grew other regex
204+
engines (DFA, bounded backtracker) and sophisticated literal optimizations.
205+
The regex macro didn't keep pace, and it therefore became (dramatically) slower
206+
than the dynamic engines. The only reason left to use it was for the compile
207+
time guarantee that your regex is correct. Fortunately, Clippy (the Rust lint
208+
tool) has a lint that checks your regular expression validity, which mostly
209+
replaces that use case.
210+
211+
Additionally, the regex compiler plugin stopped receiving maintenance. Nobody
212+
complained. At that point, it seemed prudent to just remove it.
213+
214+
Will a compiler plugin be brought back? The future is murky, but there is
215+
definitely an opportunity there to build something that is faster than the
216+
dynamic engines in some cases. But it will be challenging! As of now, there
217+
are no plans to work on this.
219218

220219

221220
## Testing
@@ -236,7 +235,6 @@ the AT&T test suite) and code generate tests for each matching engine. The
236235
approach we use in this library is to create a Cargo.toml entry point for each
237236
matching engine we want to test. The entry points are:
238237

239-
* `tests/test_plugin.rs` - tests the `regex!` macro
240238
* `tests/test_default.rs` - tests `Regex::new`
241239
* `tests/test_default_bytes.rs` - tests `bytes::Regex::new`
242240
* `tests/test_nfa.rs` - tests `Regex::new`, forced to use the NFA
@@ -261,18 +259,14 @@ entry points, it can take a while to compile everything. To reduce compile
261259
times slightly, try using `cargo test --test default`, which will only use the
262260
`tests/test_default.rs` entry point.
263261

264-
N.B. To run tests for the `regex!` macro, use:
265-
266-
cargo test --manifest-path regex_macros/Cargo.toml
267-
268262

269263
## Benchmarking
270264

271265
The benchmarking in this crate is made up of many micro-benchmarks. Currently,
272266
there are two primary sets of benchmarks: the benchmarks that were adopted
273-
at this library's inception (in `benches/src/misc.rs`) and a newer set of
267+
at this library's inception (in `bench/src/misc.rs`) and a newer set of
274268
benchmarks meant to test various optimizations. Specifically, the latter set
275-
contain some analysis and are in `benches/src/sherlock.rs`. Also, the latter
269+
contain some analysis and are in `bench/src/sherlock.rs`. Also, the latter
276270
set are all executed on the same lengthy input whereas the former benchmarks
277271
are executed on strings of varying length.
278272

@@ -284,7 +278,6 @@ separately from the main regex crate.
284278
Benchmarking follows a similarly wonky setup as tests. There are multiple entry
285279
points:
286280

287-
* `bench_rust_plugin.rs` - benchmarks the `regex!` macro
288281
* `bench_rust.rs` - benchmarks `Regex::new`
289282
* `bench_rust_bytes.rs` benchmarks `bytes::Regex::new`
290283
* `bench_pcre.rs` - benchmarks PCRE
@@ -299,36 +292,36 @@ library benchmarks (especially RE2).
299292
If you're hacking on one of the matching engines and just want to see
300293
benchmarks, then all you need to run is:
301294

302-
$ ./run-bench rust
295+
$ ./bench/run rust
303296

304297
If you want to compare your results with older benchmarks, then try:
305298

306-
$ ./run-bench rust | tee old
299+
$ ./bench/run rust | tee old
307300
$ ... make it faster
308-
$ ./run-bench rust | tee new
309-
$ cargo-benchcmp old new --improvements
301+
$ ./bench/run rust | tee new
302+
$ cargo benchcmp old new --improvements
310303

311304
The `cargo-benchcmp` utility is available here:
312305
https://github.com/BurntSushi/cargo-benchcmp
313306

314-
The `run-bench` utility can run benchmarks for PCRE and Oniguruma too. See
315-
`./run-bench --help`.
307+
The `./bench/run` utility can run benchmarks for PCRE and Oniguruma too. See
308+
`./bench/bench --help`.
316309

317310
## Dev Docs
318311

319312
When digging your teeth into the codebase for the first time, the
320313
crate documentation can be a great resource. By default `rustdoc`
321314
will strip out all documentation of private crate members in an
322315
effort to help consumers of the crate focus on the *interface*
323-
without having to concern themselves with the *implimentation*.
316+
without having to concern themselves with the *implementation*.
324317
Normally this is a great thing, but if you want to start hacking
325318
on regex internals it is not what you want. Many of the private members
326319
of this crate are well documented with rustdoc style comments, and
327320
it would be a shame to miss out on the opportunity that presents.
328321
You can generate the private docs with:
329322

330323
```
331-
> rustdoc --crate-name docs src/lib.rs -o target/doc -L target/debug/deps --no-defaults --passes collapse-docs --passes unindent-comments
324+
$ rustdoc --crate-name docs src/lib.rs -o target/doc -L target/debug/deps --no-defaults --passes collapse-docs --passes unindent-comments
332325
```
333326

334327
Then just point your browser at `target/doc/regex/index.html`.

PERFORMANCE.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Your friendly guide to understanding the performance characteristics of this
22
crate.
33

44
This guide assumes some familiarity with the public API of this crate, which
5-
can be found here: http://doc.rust-lang.org/regex/regex/index.html
5+
can be found here: https://docs.rs/regex
66

77
## Theory vs. Practice
88

README.md

+3-35
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ by [RE2](https://github.com/google/re2).
1414

1515
### Documentation
1616

17-
[Module documentation with examples](https://doc.rust-lang.org/regex).
17+
[Module documentation with examples](https://docs.rs/regex).
1818
The module documentation also include a comprehensive description of the syntax
1919
supported.
2020

2121
Documentation with examples for the various matching functions and iterators
2222
can be found on the
23-
[`Regex` type](https://doc.rust-lang.org/regex/regex/struct.Regex.html).
23+
[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html).
2424

2525
### Usage
2626

@@ -188,37 +188,6 @@ assert!(!matches.matched(5));
188188
assert!(matches.matched(6));
189189
```
190190

191-
### Usage: `regex!` compiler plugin
192-
193-
**WARNING**: The `regex!` compiler plugin is orders of magnitude slower than
194-
the normal `Regex::new(...)` usage. You should not use the compiler plugin
195-
unless you have a very special reason for doing so. The performance difference
196-
may be the temporary, but the path forward at this point isn't clear.
197-
198-
The `regex!` compiler plugin will compile your regexes at compile time. **This
199-
only works with a nightly compiler.**
200-
201-
Here is a small example:
202-
203-
```rust
204-
#![feature(plugin)]
205-
206-
#![plugin(regex_macros)]
207-
extern crate regex;
208-
209-
fn main() {
210-
let re = regex!(r"(\d{4})-(\d{2})-(\d{2})");
211-
let caps = re.captures("2010-03-14").unwrap();
212-
213-
assert_eq!("2010", caps[1]);
214-
assert_eq!("03", caps[2]);
215-
assert_eq!("14", caps[3]);
216-
}
217-
```
218-
219-
Notice that we never `unwrap` the result of `regex!`. This is because your
220-
*program* won't compile if the regex doesn't compile. (Try `regex!("(")`.)
221-
222191

223192
### Usage: a regular expression parser
224193

@@ -228,8 +197,7 @@ execution. This may be useful if you're implementing your own regex engine or
228197
otherwise need to do analysis on the syntax of a regular expression. It is
229198
otherwise not recommended for general use.
230199

231-
[Documentation for `regex-syntax` with
232-
examples](https://doc.rust-lang.org/regex/regex_syntax/index.html).
200+
[Documentation for `regex-syntax` with examples](https://docs.rs/regex-syntax).
233201

234202
# License
235203

appveyor.yml

+4-2
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,10 @@ install:
1010
- SET PATH=%PATH%;C:\MinGW\bin
1111
- rustc -V
1212
- cargo -V
13-
1413
build: false
15-
1614
test_script:
1715
- cargo test --verbose --jobs 4
16+
branches:
17+
only:
18+
- master
19+
- auto

bench/Cargo.toml

+10-20
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,27 @@ version = "0.1.0"
55
authors = ["The Rust Project Developers"]
66
license = "MIT/Apache-2.0"
77
repository = "https://github.com/rust-lang/regex"
8-
documentation = "http://doc.rust-lang.org/regex/regex/index.html"
8+
documentation = "https://docs.rs/regex"
99
homepage = "https://github.com/rust-lang/regex"
1010
description = "Regex benchmarks for Rust's and other engines."
1111
build = "build.rs"
12+
workspace = ".."
1213

1314
[dependencies]
14-
docopt = "0.6"
15-
lazy_static = "0.1"
15+
docopt = "0.8"
16+
lazy_static = "1"
1617
libc = "0.2"
17-
onig = { version = "1.2", optional = true }
18+
onig = { version = "3", optional = true }
1819
libpcre-sys = { version = "0.2", optional = true }
1920
memmap = "0.2"
2021
regex = { version = "0.2.0", path = "..", features = ["simd-accel"] }
21-
regex_macros = { version = "0.2.0", path = "../regex_macros", optional = true }
2222
regex-syntax = { version = "0.4.0", path = "../regex-syntax" }
23-
rustc-serialize = "0.3"
23+
serde = "1"
24+
serde_derive = "1"
2425

2526
[build-dependencies]
26-
gcc = "0.3"
27-
pkg-config = "0.3"
27+
cc = "1"
28+
pkg-config = "0.3.9"
2829

2930
[[bin]]
3031
name = "regex-run-one"
@@ -40,29 +41,18 @@ bench = false
4041
# Doing anything else will probably result in weird "duplicate definition"
4142
# compiler errors.
4243
#
43-
# Tip: use the run-bench script in the root of this repository to run
44-
# benchmarks.
44+
# Tip: use the `bench/run` script (in this directory) to run benchmarks.
4545
[features]
4646
re-pcre1 = ["libpcre-sys"]
4747
re-pcre2 = []
4848
re-onig = ["onig"]
4949
re-re2 = []
5050
re-rust = []
5151
re-rust-bytes = []
52-
re-rust-plugin = ["regex_macros"]
5352
re-tcl = []
5453

5554
[[bench]]
5655
name = "bench"
5756
path = "src/bench.rs"
5857
test = false
5958
bench = true
60-
61-
[profile.release]
62-
debug = true
63-
64-
[profile.bench]
65-
debug = true
66-
67-
[profile.test]
68-
debug = true

0 commit comments

Comments
 (0)