Skip to content

Commit 431fa24

Browse files
adpaco-awstedinski
authored andcommitted
Documentation for testing (rust-lang#592)
* Docs for testing * Address review comments
1 parent f77263a commit 431fa24

File tree

3 files changed

+211
-12
lines changed

3 files changed

+211
-12
lines changed

rmc-docs/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,6 @@
1616
- [Where to start on real code](./tutorial-real-code.md)
1717

1818
- [RMC developer documentation](./dev-documentation.md)
19+
- [Testing](./rmc-testing.md)
1920

2021
- [RMC dashboard](./dashboard.md)

rmc-docs/src/dashboard.md

Lines changed: 45 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,30 +13,63 @@ However, not all examples from these books are suited for verification.
1313
For instance, some of them are only included to show what is valid Rust code (or what is not).
1414

1515
Because of that, we run up to three different types of jobs when generating the dashboard:
16-
* `check` jobs (`BUILD` stage): This check uses the Rust front-end to detect if the example is valid Rust code.
17-
* `codegen` jobs (`TEST` stage): This check uses the RMC back-end to determine if we can generate GotoC code.
18-
* `verification` jobs (`REPORT` stage): This check uses CBMC to obtain a verification result.
16+
* `check` jobs: This check uses the Rust front-end to detect if the example is valid Rust code.
17+
* `codegen` jobs: This check uses the RMC back-end to determine if we can generate GotoC code.
18+
* `verification` jobs: This check uses CBMC to obtain a verification result.
1919

2020
Note that these are incremental: A `verification` job depends on a previous `codegen` job.
2121
Similary, a `codegen` job depends on a `check` job.
2222

23-
> **Warning:** [Litani](https://github.com/awslabs/aws-build-accumulator) does not support
24-
> hierarchical views nor custom stages at the moment. For this reason, the results are
25-
> displayed for each example using Litani's default stages (`BUILD`, `TEST` and `REPORT`).
23+
> **Warning:** [Litani](https://github.com/awslabs/aws-build-accumulator) does
24+
> not support hierarchical views at the moment. For this reason, we are
25+
> publishing a [text version of the dashboard](./dashboard/dashboard.txt) which
26+
> displays the same results in a hierarchical way while we work on adding more
27+
> features to [Litani](https://github.com/awslabs/aws-build-accumulator).
2628
2729
Before running the above mentioned jobs, we pre-process the examples to:
2830
1. Set the expected output according to flags present in the code snippet.
2931
2. Add any required compiler/RMC flags (e.g., CBMC unwinding flags).
30-
3. Include custom assertions for verification (only in the case of `verification` jobs).
3132

3233
Finally, we run all jobs, collect their outputs and compare them against the expected outputs.
3334
The results are summarized as follows: If the obtained and expected outputs differ,
3435
the color of the stage bar will be red. Otherwise, it will be blue.
3536
If an example shows one red bar, it is considered a failed example that cannot be handled by RMC.
3637

37-
The [RMC Dashboard](./dashboard/index.html) is automatically updated whenever
38-
a PR gets merged into RMC.
38+
The [RMC Dashboard](./dashboard/index.html) and [its text version](./dashboard/dashboard.txt) are
39+
automatically updated whenever a PR gets merged into RMC.
3940

40-
> **Tip:** In addition, we publish a [text version of the dashboard](./dashboard/dashboard.txt)
41-
> while we work on adding more features to [Litani](https://github.com/awslabs/aws-build-accumulator).
42-
> The [text-based dashboard](./dashboard/dashboard.txt) displays the same results in hierarchical way.
41+
## The dashboard procedure
42+
43+
This section describes how the dashboard operates at a high level.
44+
45+
To kick off the dashboard process use
46+
47+
```
48+
./x.py run -i --stage 1 dashboard
49+
```
50+
51+
The main function of the dashboard is `generate_dashboard()` in
52+
[`src/tools/dashboard/src/books.rs`](https://github.com/model-checking/rmc/blob/main/src/tools/dashboard/src/books.rs),
53+
which follows these steps:
54+
* First, it calls the different `parse_..._hierarchy()` functions which parse
55+
the summary files for each book.
56+
* The `extract_examples(...)` function uses `rustdoc` to extract all examples
57+
from the books.
58+
* Then for each example it will check if there is a corresponding `.props` file
59+
in `src/tools/dashboard/configs/`. The contents of these files (e.g.,
60+
command-line options) are prepended to the example.
61+
* All examples are written in the `src/test/dashboard/books/` folder.
62+
63+
In general, the path to a given example is
64+
`src/test/dashboard/books/<book>/<chapter>/<section>/<subsection>/<line>.rs`
65+
where `<line>` is the line number where the example appears in the
66+
documentation. The `.props` files mentioned above follow the same naming
67+
scheme in order to match them and detect conflicts.
68+
69+
* Then all examples are run using
70+
[Litani](https://github.com/awslabs/aws-build-accumulator).
71+
* Finally, the Litani log is used to generate the [text version of the
72+
dashboard](./dashboard/dashboard.txt).
73+
74+
> **Warning:** Note that any changes done to the examples in
75+
> `src/test/dashboard/books/` may be gone if the dashboard is executed.

rmc-docs/src/rmc-testing.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Testing
2+
3+
Testing in RMC is carried out in multiple ways. There are at least
4+
two very good reasons to do it:
5+
1. **Software regression**: A regression is a type of bug
6+
that appears after a change is introduced where a feature that
7+
was previously working has unexpectedly stopped working.
8+
9+
Regression testing allows one to prevent a software regression
10+
from happening by running a comprehensive set of working tests
11+
before any change is committed to the project.
12+
2. **Software metrics**: A metric is a measure of software
13+
characteristics which are quantitative and countable. Metrics are
14+
particularly valuable for project management purposes.
15+
16+
We recommend reading our section on [Regression Testing](#regression-testing)
17+
if you are interested in RMC development. At present, we obtain metrics based
18+
on the [RMC dashboard](./dashboard.md).
19+
20+
# Regression testing
21+
22+
RMC relies on a quite extensive range of tests to perform regression testing.
23+
Regression testing can be executed by running the command:
24+
25+
```
26+
./scripts/rmc-regression.sh
27+
```
28+
29+
The `rmc-regression.sh` script executes different testing commands, which we classify into:
30+
* [RMC testing suites](#rmc-testing-suites)
31+
* [Rust unit tests](#rust-unit-tests)
32+
* [Script-based tests](#script-based-tests)
33+
34+
See below for a description of each one.
35+
36+
Note that regression testing is run whenever a Pull Request is opened, updated or merged
37+
into the main branch. Therefore, it is a good idea to run regression testing locally before
38+
submitting a Pull Request for RMC.
39+
40+
## RMC testing suites
41+
42+
The RMC testing suites are the main testing resource for RMC. In most cases, the
43+
tests contained in the RMC testing suites are single Rust files that are run
44+
using the following command:
45+
46+
```
47+
rmc file.rs <options>
48+
```
49+
50+
Command-line options `<options>` can be passed to the test by adding a special
51+
comment to the file.
52+
Read more about it in the [Testingoptions](#testing-options) section.
53+
54+
In particular, the RMC testing suites are composed of:
55+
* `rmc`: The main testing suite for RMC. The test is a single Rust file that is
56+
run through RMC. In general, the test passes if verification with RMC
57+
is successful, otherwise it fails.
58+
* `firecracker`: Works like `rmc` but contains tests inspired by
59+
[Firecracker](https://github.com/firecracker-microvm/firecracker) code.
60+
* `prusti`: Works like `rmc` but contains tests from the
61+
[Prusti](https://github.com/viperproject/prusti-dev) tool.
62+
* `smack`: Works like `rmc` but contains tests from the
63+
[SMACK](https://github.com/smackers/smack) tool.
64+
* `expected`: Similar to `rmc` but with an additional check which ensures that
65+
lines appearing in `*.expected` files appear in the output
66+
generated by `rmc`.
67+
* `cargo-rmc`: This suite is designed to test the `cargo-rmc` command. As such,
68+
this suite works with packages instead of single Rust files.
69+
Flags can be specified in the `Cargo.toml` configuration file.
70+
Similar to the `expected` suite, we look for `*.expected` files
71+
for each function under test.
72+
73+
We have extended
74+
[`compiletest`](https://rustc-dev-guide.rust-lang.org/tests/intro.html) (the
75+
Rust compiler testing framework) to work with these suites. That way, we take
76+
advantage of all `compiletest` features (e.g., parallel execution).
77+
78+
### Testing stages
79+
80+
The process of running single-file tests is split into three stages:
81+
* `check`: This stage uses the Rust front-end to detect if the example is valid
82+
Rust code.
83+
* `codegen`: This stage uses the RMC back-end to determine if we can generate
84+
GotoC code.
85+
* `verify`: This stage uses CBMC to obtain a verification result.
86+
87+
If a test fails, the error message will include the stage where it failed:
88+
89+
```
90+
error: test failed: expected check success, got failure
91+
```
92+
93+
When working on a test that is expected to fail, there are two options to
94+
indicate an expected failure. The first one is to add a comment
95+
96+
```rust
97+
// rmc-<stage>-fail
98+
```
99+
at the top of the test file, where `<stage>` is the stage where the test is
100+
expected to fail.
101+
102+
The other option is to use the predicate `rmc::expect_fail(cond, message)`
103+
included in the RMC library. The `cond` in `rmc::expect_fail` is a condition
104+
that you expect not to hold during verification. The testing framework expects
105+
one `EXPECTED FAIL` message in the verification output for each use of the
106+
predicate.
107+
108+
> **Warning:** Note that `rmc::expect_fail` is only useful to indicate
109+
> failure in the `verify` stage, errors in other stages will be considered
110+
> testing failures.
111+
112+
### Testing options
113+
114+
Many tests will require passing command-line options to RMC. These options can
115+
be specified in single Rust files by adding a comment at the top of the file:
116+
```
117+
// rmc-flags: <options>
118+
```
119+
120+
For example, to increase the unwinding value to 4 in a test, we can write:
121+
122+
```
123+
// rmc-flags: --cbmc-args --unwind 4
124+
```
125+
126+
Alternatively, CBMC flags can also be passed using `cbmc-flags`:
127+
128+
```
129+
// cbmc-flags: --unwind 4
130+
```
131+
132+
> **Warning:** `cbmc-flags` is likely to be deprecated in the near future. We
133+
> recommend using `rmc-flags` with `--cbmc-args` for now.
134+
135+
For `cargo-rmc` tests, the preferred way to pass command-line options is adding
136+
them to `Cargo.toml` below the `[rmc.flags]` marker.
137+
138+
## Rust unit tests
139+
140+
These tests follow the
141+
[Rust unit testing](https://doc.rust-lang.org/rust-by-example/testing/unit_testing.html)
142+
style.
143+
144+
At present, RMC only uses unit tests in the
145+
[cbmc crate](https://github.com/model-checking/rmc/tree/main/compiler/cbmc)
146+
to test the
147+
[identity symbol table transformer](https://github.com/model-checking/rmc/blob/main/compiler/cbmc/src/goto_program/symtab_transformer/identity_transformer.rs).
148+
149+
## Script-based tests
150+
151+
These are tests which are run using scripts. Scripting gives us the ability to
152+
perform ad-hoc checks that cannot be done otherwise. They are currently used
153+
for:
154+
* Standard library codegen
155+
* Firecracker virtio codegen
156+
* Diamond dependency
157+
* Type mismatch
158+
159+
In fact, most of them are equivalent to running `cargo rmc` and performing
160+
checks on the output. The downside to scripting is that these tests will always
161+
be run, even if there have not been any changes since the last time the
162+
regression was run.
163+
164+
> **Warning:** `cargo rmc` is under heavy development at the moment. Because of
165+
> that, this section may become outdated soon.

0 commit comments

Comments
 (0)