Skip to content

Commit 53b1c48

Browse files
committed
Auto merge of #8087 - ehuss:freshness-interrupted2, r=alexcrichton
Fix freshness when linking is interrupted. Fixes a scenario where hitting Ctrl-C while linking would leave a corrupted executable, but Cargo would think it is "fresh" and fail to rebuild it. This also includes a separate commit which adds more documentation on fingerprinting. Fixes #7767
2 parents 7d720ef + 14e86cc commit 53b1c48

File tree

3 files changed

+288
-47
lines changed

3 files changed

+288
-47
lines changed

src/cargo/core/compiler/context/compilation_files.rs

+16-1
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,20 @@ use crate::core::compiler::{CompileMode, CompileTarget, Unit};
1313
use crate::core::{Target, TargetKind, Workspace};
1414
use crate::util::{self, CargoResult};
1515

16-
/// The `Metadata` is a hash used to make unique file names for each unit in a build.
16+
/// The `Metadata` is a hash used to make unique file names for each unit in a
17+
/// build. It is also use for symbol mangling.
18+
///
1719
/// For example:
1820
/// - A project may depend on crate `A` and crate `B`, so the package name must be in the file name.
1921
/// - Similarly a project may depend on two versions of `A`, so the version must be in the file name.
22+
///
2023
/// In general this must include all things that need to be distinguished in different parts of
2124
/// the same build. This is absolutely required or we override things before
2225
/// we get chance to use them.
2326
///
27+
/// It is also used for symbol mangling, because if you have two versions of
28+
/// the same crate linked together, their symbols need to be differentiated.
29+
///
2430
/// We use a hash because it is an easy way to guarantee
2531
/// that all the inputs can be converted to a valid path.
2632
///
@@ -39,6 +45,15 @@ use crate::util::{self, CargoResult};
3945
/// more space than needed. This makes not including something in `Metadata`
4046
/// a form of cache invalidation.
4147
///
48+
/// You should also avoid anything that would interfere with reproducible
49+
/// builds. For example, *any* absolute path should be avoided. This is one
50+
/// reason that `RUSTFLAGS` is not in `Metadata`, because it often has
51+
/// absolute paths (like `--remap-path-prefix` which is fundamentally used for
52+
/// reproducible builds and has absolute paths in it). Also, in some cases the
53+
/// mangled symbols need to be stable between different builds with different
54+
/// settings. For example, profile-guided optimizations need to swap
55+
/// `RUSTFLAGS` between runs, but needs to keep the same symbol names.
56+
///
4257
/// Note that the `Fingerprint` is in charge of tracking everything needed to determine if a
4358
/// rebuild is needed.
4459
#[derive(Copy, Clone, Hash, Eq, PartialEq, Ord, PartialOrd)]

src/cargo/core/compiler/fingerprint.rs

+187-46
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,30 @@
55
//! (needs to be recompiled) or "fresh" (it does not need to be recompiled).
66
//! There are several mechanisms that influence a Unit's freshness:
77
//!
8-
//! - The `Metadata` hash isolates each Unit on the filesystem by being
9-
//! embedded in the filename. If something in the hash changes, then the
10-
//! output files will be missing, and the Unit will be dirty (missing
11-
//! outputs are considered "dirty").
12-
//! - The `Fingerprint` is another hash, saved to the filesystem in the
13-
//! `.fingerprint` directory, that tracks information about the inputs to a
14-
//! Unit. If any of the inputs changes from the last compilation, then the
15-
//! Unit is considered dirty. A missing fingerprint (such as during the
16-
//! first build) is also considered dirty.
17-
//! - Whether or not input files are actually present. For example a build
18-
//! script which says it depends on a nonexistent file `foo` is always rerun.
19-
//! - Propagation throughout the dependency graph of file modification time
20-
//! information, used to detect changes on the filesystem. Each `Fingerprint`
21-
//! keeps track of what files it'll be processing, and when necessary it will
22-
//! check the `mtime` of each file (last modification time) and compare it to
23-
//! dependencies and output to see if files have been changed or if a change
24-
//! needs to force recompiles of downstream dependencies.
8+
//! - The `Fingerprint` is a hash, saved to the filesystem in the
9+
//! `.fingerprint` directory, that tracks information about the Unit. If the
10+
//! fingerprint is missing (such as the first time the unit is being
11+
//! compiled), then the unit is dirty. If any of the fingerprint fields
12+
//! change (like the name of the source file), then the Unit is considered
13+
//! dirty.
14+
//!
15+
//! The `Fingerprint` also tracks the fingerprints of all its dependencies,
16+
//! so a change in a dependency will propagate the "dirty" status up.
17+
//!
18+
//! - Filesystem mtime tracking is also used to check if a unit is dirty.
19+
//! See the section below on "Mtime comparison" for more details. There
20+
//! are essentially two parts to mtime tracking:
21+
//!
22+
//! 1. The mtime of a Unit's output files is compared to the mtime of all
23+
//! its dependencies' output file mtimes (see `check_filesystem`). If any
24+
//! output is missing, or is older than a dependency's output, then the
25+
//! unit is dirty.
26+
//! 2. The mtime of a Unit's source files is compared to the mtime of its
27+
//! dep-info file in the fingerprint directory (see `find_stale_file`).
28+
//! The dep-info file is used as an anchor to know when the last build of
29+
//! the unit was done. See the "dep-info files" section below for more
30+
//! details. If any input files are missing, or are newer than the
31+
//! dep-info, then the unit is dirty.
2532
//!
2633
//! Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking
2734
//! is notoriously imprecise and problematic. Only a small part of the
@@ -33,11 +40,16 @@
3340
//!
3441
//! ## Fingerprints and Metadata
3542
//!
43+
//! The `Metadata` hash is a hash added to the output filenames to isolate
44+
//! each unit. See the documentation in the `compilation_files` module for
45+
//! more details. NOTE: Not all output files are isolated via filename hashes
46+
//! (like dylibs), but the fingerprint directory always has the `Metadata`
47+
//! hash in its directory name.
48+
//!
3649
//! Fingerprints and Metadata are similar, and track some of the same things.
3750
//! The Metadata contains information that is required to keep Units separate.
3851
//! The Fingerprint includes additional information that should cause a
39-
//! recompile, but it is desired to reuse the same filenames. Generally the
40-
//! items in the Metadata do not need to be in the Fingerprint. A comparison
52+
//! recompile, but it is desired to reuse the same filenames. A comparison
4153
//! of what is tracked:
4254
//!
4355
//! Value | Fingerprint | Metadata
@@ -54,8 +66,7 @@
5466
//! __CARGO_DEFAULT_LIB_METADATA[^4] | | ✓
5567
//! package_id | | ✓
5668
//! authors, description, homepage, repo | ✓ |
57-
//! Target src path | ✓ |
58-
//! Target path relative to ws | ✓ |
69+
//! Target src path relative to ws | ✓ |
5970
//! Target flags (test/bench/for_host/edition) | ✓ |
6071
//! -C incremental=… flag | ✓ |
6172
//! mtime of sources | ✓[^3] |
@@ -64,12 +75,19 @@
6475
//!
6576
//! [^1]: Build script and bin dependencies are not included.
6677
//!
67-
//! [^3]: The mtime is only tracked for workspace members and path
68-
//! dependencies. Git dependencies track the git revision.
78+
//! [^3]: See below for details on mtime tracking.
6979
//!
7080
//! [^4]: `__CARGO_DEFAULT_LIB_METADATA` is set by rustbuild to embed the
7181
//! release channel (bootstrap/stable/beta/nightly) in libstd.
7282
//!
83+
//! When deciding what should go in the Metadata vs the Fingerprint, consider
84+
//! that some files (like dylibs) do not have a hash in their filename. Thus,
85+
//! if a value changes, only the fingerprint will detect the change (consider,
86+
//! for example, swapping between different features). Fields that are only in
87+
//! Metadata generally aren't relevant to the fingerprint because they
88+
//! fundamentally change the output (like target vs host changes the directory
89+
//! where it is emitted).
90+
//!
7391
//! ## Fingerprint files
7492
//!
7593
//! Fingerprint information is stored in the
@@ -83,9 +101,7 @@
83101
//! `CARGO_LOG=cargo::core::compiler::fingerprint=trace cargo build` can be
84102
//! used to display this log information.
85103
//! - A "dep-info" file which contains a list of source filenames for the
86-
//! target. This is produced by reading the output of `rustc
87-
//! --emit=dep-info` and packing it into a condensed format. Cargo uses this
88-
//! to check the mtime of every file to see if any of them have changed.
104+
//! target. See below for details.
89105
//! - An `invoked.timestamp` file whose filesystem mtime is updated every time
90106
//! the Unit is built. This is an experimental feature used for cleaning
91107
//! unused artifacts.
@@ -110,6 +126,103 @@
110126
//! all dependencies, when it is updated, by using `Arc` clones, it
111127
//! automatically picks up the updates to its dependencies.
112128
//!
129+
//! ### dep-info files
130+
//!
131+
//! Cargo passes the `--emit=dep-info` flag to `rustc` so that `rustc` will
132+
//! generate a "dep info" file (with the `.d` extension). This is a
133+
//! Makefile-like syntax that includes all of the source files used to build
134+
//! the crate. This file is used by Cargo to know which files to check to see
135+
//! if the crate will need to be rebuilt.
136+
//!
137+
//! After `rustc` exits successfully, Cargo will read the dep info file and
138+
//! translate it into a binary format that is stored in the fingerprint
139+
//! directory (`translate_dep_info`). The mtime of the fingerprint dep-info
140+
//! file itself is used as the reference for comparing the source files to
141+
//! determine if any of the source files have been modified (see below for
142+
//! more detail).
143+
//!
144+
//! There is also a third dep-info file. Cargo will extend the file created by
145+
//! rustc with some additional information and saves this into the output
146+
//! directory. This is intended for build system integration. See the
147+
//! `output_depinfo` module for more detail.
148+
//!
149+
//! #### -Zbinary-dep-depinfo
150+
//!
151+
//! `rustc` has an experimental flag `-Zbinary-dep-depinfo`. This causes
152+
//! `rustc` to include binary files (like rlibs) in the dep-info file. This is
153+
//! primarily to support rustc development, so that Cargo can check the
154+
//! implicit dependency to the standard library (which lives in the sysroot).
155+
//! We want Cargo to recompile whenever the standard library rlib/dylibs
156+
//! change, and this is a generic mechanism to make that work.
157+
//!
158+
//! ### Mtime comparison
159+
//!
160+
//! The use of modification timestamps is the most common way a unit will be
161+
//! determined to be dirty or fresh between builds. There are many subtle
162+
//! issues and edge cases with mtime comparisons. This gives a high-level
163+
//! overview, but you'll need to read the code for the gritty details. Mtime
164+
//! handling is different for different unit kinds. The different styles are
165+
//! driven by the `Fingerprint.local` field, which is set based on the unit
166+
//! kind.
167+
//!
168+
//! The status of whether or not the mtime is "stale" or "up-to-date" is
169+
//! stored in `Fingerprint.fs_status`.
170+
//!
171+
//! All units will compare the mtime of its newest output file with the mtimes
172+
//! of the outputs of all its dependencies. If any output file is missing,
173+
//! then the unit is stale. If any dependency is newer, the unit is stale.
174+
//!
175+
//! #### Normal package mtime handling
176+
//!
177+
//! `LocalFingerprint::CheckDepinfo` is used for checking the mtime of
178+
//! packages. It compares the mtime of the input files (the source files) to
179+
//! the mtime of the dep-info file (which is written last after a build is
180+
//! finished). If the dep-info is missing, the unit is stale (it has never
181+
//! been built). The list of input files comes from the dep-info file. See the
182+
//! section above for details on dep-info files.
183+
//!
184+
//! Also note that although registry and git packages use `CheckDepInfo`, none
185+
//! of their source files are included in the dep-info (see
186+
//! `translate_dep_info`), so for those kinds no mtime checking is done
187+
//! (unless `-Zbinary-dep-depinfo` is used). Repository and git packages are
188+
//! static, so there is no need to check anything.
189+
//!
190+
//! When a build is complete, the mtime of the dep-info file in the
191+
//! fingerprint directory is modified to rewind it to the time when the build
192+
//! started. This is done by creating an `invoked.timestamp` file when the
193+
//! build starts to capture the start time. The mtime is rewound to the start
194+
//! to handle the case where the user modifies a source file while a build is
195+
//! running. Cargo can't know whether or not the file was included in the
196+
//! build, so it takes a conservative approach of assuming the file was *not*
197+
//! included, and it should be rebuilt during the next build.
198+
//!
199+
//! #### Rustdoc mtime handling
200+
//!
201+
//! Rustdoc does not emit a dep-info file, so Cargo currently has a relatively
202+
//! simple system for detecting rebuilds. `LocalFingerprint::Precalculated` is
203+
//! used for rustdoc units. For registry packages, this is the package
204+
//! version. For git packages, it is the git hash. For path packages, it is
205+
//! the a string of the mtime of the newest file in the package.
206+
//!
207+
//! There are some known bugs with how this works, so it should be improved at
208+
//! some point.
209+
//!
210+
//! #### Build script mtime handling
211+
//!
212+
//! Build script mtime handling runs in different modes. There is the "old
213+
//! style" where the build script does not emit any `rerun-if` directives. In
214+
//! this mode, Cargo will use `LocalFingerprint::Precalculated`. See the
215+
//! "rustdoc" section above how it works.
216+
//!
217+
//! In the new-style, each `rerun-if` directive is translated to the
218+
//! corresponding `LocalFingerprint` variant. The `RerunIfChanged` variant
219+
//! compares the mtime of the given filenames against the mtime of the
220+
//! "output" file.
221+
//!
222+
//! Similar to normal units, the build script "output" file mtime is rewound
223+
//! to the time just before the build script is executed to handle mid-build
224+
//! modifications.
225+
//!
113226
//! ## Considerations for inclusion in a fingerprint
114227
//!
115228
//! Over time we've realized a few items which historically were included in
@@ -277,6 +390,40 @@ pub fn prepare_target<'a, 'cfg>(
277390
return Ok(Job::new(Work::noop(), Fresh));
278391
}
279392

393+
// Clear out the old fingerprint file if it exists. This protects when
394+
// compilation is interrupted leaving a corrupt file. For example, a
395+
// project with a lib.rs and integration test (two units):
396+
//
397+
// 1. Build the library and integration test.
398+
// 2. Make a change to lib.rs (NOT the integration test).
399+
// 3. Build the integration test, hit Ctrl-C while linking. With gcc, this
400+
// will leave behind an incomplete executable (zero size, or partially
401+
// written). NOTE: The library builds successfully, it is the linking
402+
// of the integration test that we are interrupting.
403+
// 4. Build the integration test again.
404+
//
405+
// Without the following line, then step 3 will leave a valid fingerprint
406+
// on the disk. Then step 4 will think the integration test is "fresh"
407+
// because:
408+
//
409+
// - There is a valid fingerprint hash on disk (written in step 1).
410+
// - The mtime of the output file (the corrupt integration executable
411+
// written in step 3) is newer than all of its dependencies.
412+
// - The mtime of the integration test fingerprint dep-info file (written
413+
// in step 1) is newer than the integration test's source files, because
414+
// we haven't modified any of its source files.
415+
//
416+
// But the executable is corrupt and needs to be rebuilt. Clearing the
417+
// fingerprint at step 3 ensures that Cargo never mistakes a partially
418+
// written output as up-to-date.
419+
if loc.exists() {
420+
// Truncate instead of delete so that compare_old_fingerprint will
421+
// still log the reason for the fingerprint failure instead of just
422+
// reporting "failed to read fingerprint" during the next build if
423+
// this build fails.
424+
paths::write(&loc, b"")?;
425+
}
426+
280427
let write_fingerprint = if unit.mode.is_run_custom_build() {
281428
// For build scripts the `local` field of the fingerprint may change
282429
// while we're executing it. For example it could be in the legacy
@@ -484,9 +631,8 @@ impl<'de> Deserialize<'de> for DepFingerprint {
484631
#[derive(Debug, Serialize, Deserialize, Hash)]
485632
enum LocalFingerprint {
486633
/// This is a precalculated fingerprint which has an opaque string we just
487-
/// hash as usual. This variant is primarily used for git/crates.io
488-
/// dependencies where the source never changes so we can quickly conclude
489-
/// that there's some string we can hash and it won't really change much.
634+
/// hash as usual. This variant is primarily used for rustdoc where we
635+
/// don't have a dep-info file to compare against.
490636
///
491637
/// This is also used for build scripts with no `rerun-if-*` statements, but
492638
/// that's overall a mistake and causes bugs in Cargo. We shouldn't use this
@@ -1072,19 +1218,16 @@ fn calculate_normal<'a, 'cfg>(
10721218
.collect::<CargoResult<Vec<_>>>()?;
10731219
deps.sort_by(|a, b| a.pkg_id.cmp(&b.pkg_id));
10741220

1075-
// Afterwards calculate our own fingerprint information. We specially
1076-
// handle `path` packages to ensure we track files on the filesystem
1077-
// correctly, but otherwise upstream packages like from crates.io or git
1078-
// get bland fingerprints because they don't change without their
1079-
// `PackageId` changing.
1221+
// Afterwards calculate our own fingerprint information.
10801222
let target_root = target_root(cx);
1081-
let local = if use_dep_info(unit) {
1223+
let local = if unit.mode.is_doc() {
1224+
// rustdoc does not have dep-info files.
1225+
let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?;
1226+
vec![LocalFingerprint::Precalculated(fingerprint)]
1227+
} else {
10821228
let dep_info = dep_info_loc(cx, unit);
10831229
let dep_info = dep_info.strip_prefix(&target_root).unwrap().to_path_buf();
10841230
vec![LocalFingerprint::CheckDepInfo { dep_info }]
1085-
} else {
1086-
let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?;
1087-
vec![LocalFingerprint::Precalculated(fingerprint)]
10881231
};
10891232

10901233
// Figure out what the outputs of our unit is, and we'll be storing them
@@ -1128,12 +1271,6 @@ fn calculate_normal<'a, 'cfg>(
11281271
})
11291272
}
11301273

1131-
/// Whether or not the fingerprint should track the dependencies from the
1132-
/// dep-info file for this unit.
1133-
fn use_dep_info(unit: &Unit<'_>) -> bool {
1134-
!unit.mode.is_doc()
1135-
}
1136-
11371274
/// Calculate a fingerprint for an "execute a build script" unit. This is an
11381275
/// internal helper of `calculate`, don't call directly.
11391276
fn calculate_run_custom_build<'a, 'cfg>(
@@ -1412,7 +1549,10 @@ fn compare_old_fingerprint(
14121549
let old_fingerprint_json = paths::read(&loc.with_extension("json"))?;
14131550
let old_fingerprint: Fingerprint = serde_json::from_str(&old_fingerprint_json)
14141551
.chain_err(|| internal("failed to deserialize json"))?;
1415-
debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short);
1552+
// Fingerprint can be empty after a failed rebuild (see comment in prepare_target).
1553+
if !old_fingerprint_short.is_empty() {
1554+
debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short);
1555+
}
14161556
let result = new_fingerprint.compare(&old_fingerprint);
14171557
assert!(result.is_err());
14181558
result
@@ -1588,7 +1728,8 @@ impl DepInfoPathType {
15881728
/// included. If it is false, then package-relative paths are skipped and
15891729
/// ignored (typically used for registry or git dependencies where we assume
15901730
/// the source never changes, and we don't want the cost of running `stat` on
1591-
/// all those files).
1731+
/// all those files). See the module-level docs for the note about
1732+
/// `-Zbinary-dep-depinfo` for more details on why this is done.
15921733
///
15931734
/// The serialized Cargo format will contain a list of files, all of which are
15941735
/// relative if they're under `root`. or absolute if they're elsewhere.

0 commit comments

Comments
 (0)