Skip to content

Commit 8411e22

Browse files
author
Ethan Pailes
committed
document compile contract
The internal documentaion is a little lighter than usual in `src/compile.rs`, so I wrote up the impression of the compilation contract I developed by reading the code. This is honestly a thinly veiled way to check my own understanding of what is going on. Hopefully it will be valuable to the next person who starts digging into the compiler.
1 parent 37bfbd9 commit 8411e22

File tree

2 files changed

+74
-0
lines changed

2 files changed

+74
-0
lines changed

HACKING.md

+22
Original file line numberDiff line numberDiff line change
@@ -313,3 +313,25 @@ https://github.com/BurntSushi/cargo-benchcmp
313313

314314
The `run-bench` utility can run benchmarks for PCRE and Oniguruma too. See
315315
`./run-bench --help`.
316+
317+
## Dev Docs
318+
319+
When digging your teeth into the codebase for the first time, the
320+
crate documentation can be a great resource. By default `rustdoc`
321+
will strip out all documentation of private crate members in an
322+
effort to help consumers of the crate focus on the *interface*
323+
without having to concern themselves with the *implimentation*.
324+
Normally this is a great thing, but if you want to start hacking
325+
on regex internals it is not what you want. Many of the private members
326+
of this crate are well documented with rustdoc style comments, and
327+
it would be a shame to miss out on the opportunity that presents.
328+
You can generate the private docs with:
329+
330+
```
331+
> rustdoc --crate-name docs src/lib.rs -o target/doc -L target/debug/deps --no-defaults --passes collapse-docs --passes unindent-comments
332+
```
333+
334+
Then just point your browser at `target/doc/regex/index.html`.
335+
336+
See https://github.com/rust-lang/rust/issues/15347 for more info
337+
about generating developer docs for internal use.

src/compile.rs

+52
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,58 @@ impl Compiler {
205205
Ok(self.compiled)
206206
}
207207

208+
/// Compile expr into self.insts, returning a patch on success,
209+
/// or an error if we run out of memory.
210+
///
211+
/// All of the c_* methods of the compiler share the contract outlined
212+
/// here.
213+
///
214+
/// The main thing that a c_* method does is mutate `self.insts`
215+
/// to add a list of mostly compiled instructions required to execute
216+
/// the given expression. `self.insts` contains MaybeInsts rather than
217+
/// Insts because there is some backpatching required.
218+
///
219+
/// The `Patch` value returned by each c_* method provides metadata
220+
/// about the compiled instructions emitted to `self.insts`. The
221+
/// `entry` member of the patch refers to the first instruction
222+
/// (the entry point), while the `hole` member contains zero or
223+
/// more offsets to partial instructions that need to be backpatched.
224+
/// The c_* routine can't know where its list of instructions are going to
225+
/// jump to after execution, so it is up to the caller to patch
226+
/// these jumps to point to the right place. So compiling some
227+
/// expression, e, we would end up with a situation that looked like:
228+
///
229+
/// ```text
230+
/// self.insts = [ ..., i1, i2, ..., iexit1, ..., iexitn, ...]
231+
/// ^ ^ ^
232+
/// | \ /
233+
/// entry \ /
234+
/// hole
235+
/// ```
236+
///
237+
/// To compile two expressions, e1 and e2, concatinated together we
238+
/// would do:
239+
///
240+
/// ```ignore
241+
/// let patch1 = self.c(e1);
242+
/// let patch2 = self.c(e2);
243+
/// ```
244+
///
245+
/// while leaves us with a situation that looks like
246+
///
247+
/// ```text
248+
/// self.insts = [ ..., i1, ..., iexit1, ..., i2, ..., iexit2 ]
249+
/// ^ ^ ^ ^
250+
/// | | | |
251+
/// entry1 hole1 entry2 hole2
252+
/// ```
253+
///
254+
/// Then to merge the two patches together into one we would backpatch
255+
/// hole1 with entry2 and return a new patch that enters at entry1
256+
/// and has hole2 for a hole. In fact, if you look at the c_concat
257+
/// method you will see that it does exactly this, though it handles
258+
/// a list of expressions rather than just the two that we use for
259+
/// an example.
208260
fn c(&mut self, expr: &Expr) -> Result {
209261
use prog;
210262
use syntax::Expr::*;

0 commit comments

Comments
 (0)