|
5 | 5 | //! (needs to be recompiled) or "fresh" (it does not need to be recompiled).
|
6 | 6 | //! There are several mechanisms that influence a Unit's freshness:
|
7 | 7 | //!
|
8 |
| -//! - The `Metadata` hash isolates each Unit on the filesystem by being |
9 |
| -//! embedded in the filename. If something in the hash changes, then the |
10 |
| -//! output files will be missing, and the Unit will be dirty (missing |
11 |
| -//! outputs are considered "dirty"). |
12 |
| -//! - The `Fingerprint` is another hash, saved to the filesystem in the |
13 |
| -//! `.fingerprint` directory, that tracks information about the inputs to a |
14 |
| -//! Unit. If any of the inputs changes from the last compilation, then the |
15 |
| -//! Unit is considered dirty. A missing fingerprint (such as during the |
16 |
| -//! first build) is also considered dirty. |
17 |
| -//! - Whether or not input files are actually present. For example a build |
18 |
| -//! script which says it depends on a nonexistent file `foo` is always rerun. |
19 |
| -//! - Propagation throughout the dependency graph of file modification time |
20 |
| -//! information, used to detect changes on the filesystem. Each `Fingerprint` |
21 |
| -//! keeps track of what files it'll be processing, and when necessary it will |
22 |
| -//! check the `mtime` of each file (last modification time) and compare it to |
23 |
| -//! dependencies and output to see if files have been changed or if a change |
24 |
| -//! needs to force recompiles of downstream dependencies. |
| 8 | +//! - The `Fingerprint` is a hash, saved to the filesystem in the |
| 9 | +//! `.fingerprint` directory, that tracks information about the Unit. If the |
| 10 | +//! fingerprint is missing (such as the first time the unit is being |
| 11 | +//! compiled), then the unit is dirty. If any of the fingerprint fields |
| 12 | +//! change (like the name of the source file), then the Unit is considered |
| 13 | +//! dirty. |
| 14 | +//! |
| 15 | +//! The `Fingerprint` also tracks the fingerprints of all its dependencies, |
| 16 | +//! so a change in a dependency will propagate the "dirty" status up. |
| 17 | +//! |
| 18 | +//! - Filesystem mtime tracking is also used to check if a unit is dirty. |
| 19 | +//! See the section below on "Mtime comparison" for more details. There |
| 20 | +//! are essentially two parts to mtime tracking: |
| 21 | +//! |
| 22 | +//! 1. The mtime of a Unit's output files is compared to the mtime of all |
| 23 | +//! its dependencies' output file mtimes (see `check_filesystem`). If any |
| 24 | +//! output is missing, or is older than a dependency's output, then the |
| 25 | +//! unit is dirty. |
| 26 | +//! 2. The mtime of a Unit's source files is compared to the mtime of its |
| 27 | +//! dep-info file in the fingerprint directory (see `find_stale_file`). |
| 28 | +//! The dep-info file is used as an anchor to know when the last build of |
| 29 | +//! the unit was done. See the "dep-info files" section below for more |
| 30 | +//! details. If any input files are missing, or are newer than the |
| 31 | +//! dep-info, then the unit is dirty. |
25 | 32 | //!
|
26 | 33 | //! Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking
|
27 | 34 | //! is notoriously imprecise and problematic. Only a small part of the
|
|
33 | 40 | //!
|
34 | 41 | //! ## Fingerprints and Metadata
|
35 | 42 | //!
|
| 43 | +//! The `Metadata` hash is a hash added to the output filenames to isolate |
| 44 | +//! each unit. See the documentation in the `compilation_files` module for |
| 45 | +//! more details. NOTE: Not all output files are isolated via filename hashes |
| 46 | +//! (like dylibs), but the fingerprint directory always has the `Metadata` |
| 47 | +//! hash in its directory name. |
| 48 | +//! |
36 | 49 | //! Fingerprints and Metadata are similar, and track some of the same things.
|
37 | 50 | //! The Metadata contains information that is required to keep Units separate.
|
38 | 51 | //! The Fingerprint includes additional information that should cause a
|
39 |
| -//! recompile, but it is desired to reuse the same filenames. Generally the |
40 |
| -//! items in the Metadata do not need to be in the Fingerprint. A comparison |
| 52 | +//! recompile, but it is desired to reuse the same filenames. A comparison |
41 | 53 | //! of what is tracked:
|
42 | 54 | //!
|
43 | 55 | //! Value | Fingerprint | Metadata
|
|
54 | 66 | //! __CARGO_DEFAULT_LIB_METADATA[^4] | | ✓
|
55 | 67 | //! package_id | | ✓
|
56 | 68 | //! authors, description, homepage, repo | ✓ |
|
57 |
| -//! Target src path | ✓ | |
58 |
| -//! Target path relative to ws | ✓ | |
| 69 | +//! Target src path relative to ws | ✓ | |
59 | 70 | //! Target flags (test/bench/for_host/edition) | ✓ |
|
60 | 71 | //! -C incremental=… flag | ✓ |
|
61 | 72 | //! mtime of sources | ✓[^3] |
|
|
64 | 75 | //!
|
65 | 76 | //! [^1]: Build script and bin dependencies are not included.
|
66 | 77 | //!
|
67 |
| -//! [^3]: The mtime is only tracked for workspace members and path |
68 |
| -//! dependencies. Git dependencies track the git revision. |
| 78 | +//! [^3]: See below for details on mtime tracking. |
69 | 79 | //!
|
70 | 80 | //! [^4]: `__CARGO_DEFAULT_LIB_METADATA` is set by rustbuild to embed the
|
71 | 81 | //! release channel (bootstrap/stable/beta/nightly) in libstd.
|
72 | 82 | //!
|
| 83 | +//! When deciding what should go in the Metadata vs the Fingerprint, consider |
| 84 | +//! that some files (like dylibs) do not have a hash in their filename. Thus, |
| 85 | +//! if a value changes, only the fingerprint will detect the change (consider, |
| 86 | +//! for example, swapping between different features). Fields that are only in |
| 87 | +//! Metadata generally aren't relevant to the fingerprint because they |
| 88 | +//! fundamentally change the output (like target vs host changes the directory |
| 89 | +//! where it is emitted). |
| 90 | +//! |
73 | 91 | //! ## Fingerprint files
|
74 | 92 | //!
|
75 | 93 | //! Fingerprint information is stored in the
|
|
83 | 101 | //! `CARGO_LOG=cargo::core::compiler::fingerprint=trace cargo build` can be
|
84 | 102 | //! used to display this log information.
|
85 | 103 | //! - A "dep-info" file which contains a list of source filenames for the
|
86 |
| -//! target. This is produced by reading the output of `rustc |
87 |
| -//! --emit=dep-info` and packing it into a condensed format. Cargo uses this |
88 |
| -//! to check the mtime of every file to see if any of them have changed. |
| 104 | +//! target. See below for details. |
89 | 105 | //! - An `invoked.timestamp` file whose filesystem mtime is updated every time
|
90 | 106 | //! the Unit is built. This is an experimental feature used for cleaning
|
91 | 107 | //! unused artifacts.
|
|
110 | 126 | //! all dependencies, when it is updated, by using `Arc` clones, it
|
111 | 127 | //! automatically picks up the updates to its dependencies.
|
112 | 128 | //!
|
| 129 | +//! ### dep-info files |
| 130 | +//! |
| 131 | +//! Cargo passes the `--emit=dep-info` flag to `rustc` so that `rustc` will |
| 132 | +//! generate a "dep info" file (with the `.d` extension). This is a |
| 133 | +//! Makefile-like syntax that includes all of the source files used to build |
| 134 | +//! the crate. This file is used by Cargo to know which files to check to see |
| 135 | +//! if the crate will need to be rebuilt. |
| 136 | +//! |
| 137 | +//! After `rustc` exits successfully, Cargo will read the dep info file and |
| 138 | +//! translate it into a binary format that is stored in the fingerprint |
| 139 | +//! directory (`translate_dep_info`). The mtime of the fingerprint dep-info |
| 140 | +//! file itself is used as the reference for comparing the source files to |
| 141 | +//! determine if any of the source files have been modified (see below for |
| 142 | +//! more detail). |
| 143 | +//! |
| 144 | +//! There is also a third dep-info file. Cargo will extend the file created by |
| 145 | +//! rustc with some additional information and saves this into the output |
| 146 | +//! directory. This is intended for build system integration. See the |
| 147 | +//! `output_depinfo` module for more detail. |
| 148 | +//! |
| 149 | +//! #### -Zbinary-dep-depinfo |
| 150 | +//! |
| 151 | +//! `rustc` has an experimental flag `-Zbinary-dep-depinfo`. This causes |
| 152 | +//! `rustc` to include binary files (like rlibs) in the dep-info file. This is |
| 153 | +//! primarily to support rustc development, so that Cargo can check the |
| 154 | +//! implicit dependency to the standard library (which lives in the sysroot). |
| 155 | +//! We want Cargo to recompile whenever the standard library rlib/dylibs |
| 156 | +//! change, and this is a generic mechanism to make that work. |
| 157 | +//! |
| 158 | +//! ### Mtime comparison |
| 159 | +//! |
| 160 | +//! The use of modification timestamps is the most common way a unit will be |
| 161 | +//! determined to be dirty or fresh between builds. There are many subtle |
| 162 | +//! issues and edge cases with mtime comparisons. This gives a high-level |
| 163 | +//! overview, but you'll need to read the code for the gritty details. Mtime |
| 164 | +//! handling is different for different unit kinds. The different styles are |
| 165 | +//! driven by the `Fingerprint.local` field, which is set based on the unit |
| 166 | +//! kind. |
| 167 | +//! |
| 168 | +//! The status of whether or not the mtime is "stale" or "up-to-date" is |
| 169 | +//! stored in `Fingerprint.fs_status`. |
| 170 | +//! |
| 171 | +//! All units will compare the mtime of its newest output file with the mtimes |
| 172 | +//! of the outputs of all its dependencies. If any output file is missing, |
| 173 | +//! then the unit is stale. If any dependency is newer, the unit is stale. |
| 174 | +//! |
| 175 | +//! #### Normal package mtime handling |
| 176 | +//! |
| 177 | +//! `LocalFingerprint::CheckDepinfo` is used for checking the mtime of |
| 178 | +//! packages. It compares the mtime of the input files (the source files) to |
| 179 | +//! the mtime of the dep-info file (which is written last after a build is |
| 180 | +//! finished). If the dep-info is missing, the unit is stale (it has never |
| 181 | +//! been built). The list of input files comes from the dep-info file. See the |
| 182 | +//! section above for details on dep-info files. |
| 183 | +//! |
| 184 | +//! Also note that although registry and git packages use `CheckDepInfo`, none |
| 185 | +//! of their source files are included in the dep-info (see |
| 186 | +//! `translate_dep_info`), so for those kinds no mtime checking is done |
| 187 | +//! (unless `-Zbinary-dep-depinfo` is used). Repository and git packages are |
| 188 | +//! static, so there is no need to check anything. |
| 189 | +//! |
| 190 | +//! When a build is complete, the mtime of the dep-info file in the |
| 191 | +//! fingerprint directory is modified to rewind it to the time when the build |
| 192 | +//! started. This is done by creating an `invoked.timestamp` file when the |
| 193 | +//! build starts to capture the start time. The mtime is rewound to the start |
| 194 | +//! to handle the case where the user modifies a source file while a build is |
| 195 | +//! running. Cargo can't know whether or not the file was included in the |
| 196 | +//! build, so it takes a conservative approach of assuming the file was *not* |
| 197 | +//! included, and it should be rebuilt during the next build. |
| 198 | +//! |
| 199 | +//! #### Rustdoc mtime handling |
| 200 | +//! |
| 201 | +//! Rustdoc does not emit a dep-info file, so Cargo currently has a relatively |
| 202 | +//! simple system for detecting rebuilds. `LocalFingerprint::Precalculated` is |
| 203 | +//! used for rustdoc units. For registry packages, this is the package |
| 204 | +//! version. For git packages, it is the git hash. For path packages, it is |
| 205 | +//! the a string of the mtime of the newest file in the package. |
| 206 | +//! |
| 207 | +//! There are some known bugs with how this works, so it should be improved at |
| 208 | +//! some point. |
| 209 | +//! |
| 210 | +//! #### Build script mtime handling |
| 211 | +//! |
| 212 | +//! Build script mtime handling runs in different modes. There is the "old |
| 213 | +//! style" where the build script does not emit any `rerun-if` directives. In |
| 214 | +//! this mode, Cargo will use `LocalFingerprint::Precalculated`. See the |
| 215 | +//! "rustdoc" section above how it works. |
| 216 | +//! |
| 217 | +//! In the new-style, each `rerun-if` directive is translated to the |
| 218 | +//! corresponding `LocalFingerprint` variant. The `RerunIfChanged` variant |
| 219 | +//! compares the mtime of the given filenames against the mtime of the |
| 220 | +//! "output" file. |
| 221 | +//! |
| 222 | +//! Similar to normal units, the build script "output" file mtime is rewound |
| 223 | +//! to the time just before the build script is executed to handle mid-build |
| 224 | +//! modifications. |
| 225 | +//! |
113 | 226 | //! ## Considerations for inclusion in a fingerprint
|
114 | 227 | //!
|
115 | 228 | //! Over time we've realized a few items which historically were included in
|
@@ -277,6 +390,40 @@ pub fn prepare_target<'a, 'cfg>(
|
277 | 390 | return Ok(Job::new(Work::noop(), Fresh));
|
278 | 391 | }
|
279 | 392 |
|
| 393 | + // Clear out the old fingerprint file if it exists. This protects when |
| 394 | + // compilation is interrupted leaving a corrupt file. For example, a |
| 395 | + // project with a lib.rs and integration test (two units): |
| 396 | + // |
| 397 | + // 1. Build the library and integration test. |
| 398 | + // 2. Make a change to lib.rs (NOT the integration test). |
| 399 | + // 3. Build the integration test, hit Ctrl-C while linking. With gcc, this |
| 400 | + // will leave behind an incomplete executable (zero size, or partially |
| 401 | + // written). NOTE: The library builds successfully, it is the linking |
| 402 | + // of the integration test that we are interrupting. |
| 403 | + // 4. Build the integration test again. |
| 404 | + // |
| 405 | + // Without the following line, then step 3 will leave a valid fingerprint |
| 406 | + // on the disk. Then step 4 will think the integration test is "fresh" |
| 407 | + // because: |
| 408 | + // |
| 409 | + // - There is a valid fingerprint hash on disk (written in step 1). |
| 410 | + // - The mtime of the output file (the corrupt integration executable |
| 411 | + // written in step 3) is newer than all of its dependencies. |
| 412 | + // - The mtime of the integration test fingerprint dep-info file (written |
| 413 | + // in step 1) is newer than the integration test's source files, because |
| 414 | + // we haven't modified any of its source files. |
| 415 | + // |
| 416 | + // But the executable is corrupt and needs to be rebuilt. Clearing the |
| 417 | + // fingerprint at step 3 ensures that Cargo never mistakes a partially |
| 418 | + // written output as up-to-date. |
| 419 | + if loc.exists() { |
| 420 | + // Truncate instead of delete so that compare_old_fingerprint will |
| 421 | + // still log the reason for the fingerprint failure instead of just |
| 422 | + // reporting "failed to read fingerprint" during the next build if |
| 423 | + // this build fails. |
| 424 | + paths::write(&loc, b"")?; |
| 425 | + } |
| 426 | + |
280 | 427 | let write_fingerprint = if unit.mode.is_run_custom_build() {
|
281 | 428 | // For build scripts the `local` field of the fingerprint may change
|
282 | 429 | // while we're executing it. For example it could be in the legacy
|
@@ -484,9 +631,8 @@ impl<'de> Deserialize<'de> for DepFingerprint {
|
484 | 631 | #[derive(Debug, Serialize, Deserialize, Hash)]
|
485 | 632 | enum LocalFingerprint {
|
486 | 633 | /// This is a precalculated fingerprint which has an opaque string we just
|
487 |
| - /// hash as usual. This variant is primarily used for git/crates.io |
488 |
| - /// dependencies where the source never changes so we can quickly conclude |
489 |
| - /// that there's some string we can hash and it won't really change much. |
| 634 | + /// hash as usual. This variant is primarily used for rustdoc where we |
| 635 | + /// don't have a dep-info file to compare against. |
490 | 636 | ///
|
491 | 637 | /// This is also used for build scripts with no `rerun-if-*` statements, but
|
492 | 638 | /// that's overall a mistake and causes bugs in Cargo. We shouldn't use this
|
@@ -1072,19 +1218,16 @@ fn calculate_normal<'a, 'cfg>(
|
1072 | 1218 | .collect::<CargoResult<Vec<_>>>()?;
|
1073 | 1219 | deps.sort_by(|a, b| a.pkg_id.cmp(&b.pkg_id));
|
1074 | 1220 |
|
1075 |
| - // Afterwards calculate our own fingerprint information. We specially |
1076 |
| - // handle `path` packages to ensure we track files on the filesystem |
1077 |
| - // correctly, but otherwise upstream packages like from crates.io or git |
1078 |
| - // get bland fingerprints because they don't change without their |
1079 |
| - // `PackageId` changing. |
| 1221 | + // Afterwards calculate our own fingerprint information. |
1080 | 1222 | let target_root = target_root(cx);
|
1081 |
| - let local = if use_dep_info(unit) { |
| 1223 | + let local = if unit.mode.is_doc() { |
| 1224 | + // rustdoc does not have dep-info files. |
| 1225 | + let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?; |
| 1226 | + vec![LocalFingerprint::Precalculated(fingerprint)] |
| 1227 | + } else { |
1082 | 1228 | let dep_info = dep_info_loc(cx, unit);
|
1083 | 1229 | let dep_info = dep_info.strip_prefix(&target_root).unwrap().to_path_buf();
|
1084 | 1230 | vec![LocalFingerprint::CheckDepInfo { dep_info }]
|
1085 |
| - } else { |
1086 |
| - let fingerprint = pkg_fingerprint(cx.bcx, unit.pkg)?; |
1087 |
| - vec![LocalFingerprint::Precalculated(fingerprint)] |
1088 | 1231 | };
|
1089 | 1232 |
|
1090 | 1233 | // Figure out what the outputs of our unit is, and we'll be storing them
|
@@ -1128,12 +1271,6 @@ fn calculate_normal<'a, 'cfg>(
|
1128 | 1271 | })
|
1129 | 1272 | }
|
1130 | 1273 |
|
1131 |
| -/// Whether or not the fingerprint should track the dependencies from the |
1132 |
| -/// dep-info file for this unit. |
1133 |
| -fn use_dep_info(unit: &Unit<'_>) -> bool { |
1134 |
| - !unit.mode.is_doc() |
1135 |
| -} |
1136 |
| - |
1137 | 1274 | /// Calculate a fingerprint for an "execute a build script" unit. This is an
|
1138 | 1275 | /// internal helper of `calculate`, don't call directly.
|
1139 | 1276 | fn calculate_run_custom_build<'a, 'cfg>(
|
@@ -1412,7 +1549,10 @@ fn compare_old_fingerprint(
|
1412 | 1549 | let old_fingerprint_json = paths::read(&loc.with_extension("json"))?;
|
1413 | 1550 | let old_fingerprint: Fingerprint = serde_json::from_str(&old_fingerprint_json)
|
1414 | 1551 | .chain_err(|| internal("failed to deserialize json"))?;
|
1415 |
| - debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short); |
| 1552 | + // Fingerprint can be empty after a failed rebuild (see comment in prepare_target). |
| 1553 | + if !old_fingerprint_short.is_empty() { |
| 1554 | + debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short); |
| 1555 | + } |
1416 | 1556 | let result = new_fingerprint.compare(&old_fingerprint);
|
1417 | 1557 | assert!(result.is_err());
|
1418 | 1558 | result
|
@@ -1588,7 +1728,8 @@ impl DepInfoPathType {
|
1588 | 1728 | /// included. If it is false, then package-relative paths are skipped and
|
1589 | 1729 | /// ignored (typically used for registry or git dependencies where we assume
|
1590 | 1730 | /// the source never changes, and we don't want the cost of running `stat` on
|
1591 |
| -/// all those files). |
| 1731 | +/// all those files). See the module-level docs for the note about |
| 1732 | +/// `-Zbinary-dep-depinfo` for more details on why this is done. |
1592 | 1733 | ///
|
1593 | 1734 | /// The serialized Cargo format will contain a list of files, all of which are
|
1594 | 1735 | /// relative if they're under `root`. or absolute if they're elsewhere.
|
|
0 commit comments