feat!: detect SHA‐1 collision attacks #1915

emilazy · 2025-03-31T01:30:38Z

See commit messages for details, especially the last one, which has some background on the choices made here and benchmark numbers. This is best reviewed commit‐by‐commit.

Closes: #585

Tasks for @emilazy

split up change!: move the hashing API to gix_hashinto afeat forgix-hash, a change!forgix-featuresafterwards, and addadapt to changes in 'gix-features and 'gix-hash' commit with all the fixes in now broken crates.
- conventional commit message drive cargo smart-release, and also the changelog generation. For the changelog messages to be precise. The idea is to not taint crates with breaking changes unnecessarily.
- Yes, doing so means that once gix-features see their functions removed, the tree will be broken until the next commit that fixes it.
'change!: make the hashing API return ObjectId' should be split up into the breaking change, and the 'adapt' commit right after
- Very much the same as above.

More notes:

'change!: migrate hash verification to the common interface' would also have been a candidate to split, but I thought these crates are tainted anyway by gix-hash, and the change is indeed breaking for all (or most) of them. So it's fine also in the changelogs.
- This also applies to 'change!: use separate error type for I/O hashing operations'.

Byron's Tasks

refactor gix-hash tests
look at every line in detail

Byron · 2025-03-31T03:04:13Z

Thanks so much for tackling this!

It's a happy and a sad day at the same time. Without a review, I thought I'd give it a quick performance comparison with the operation I know stresses the hasher the most.

❯ hyperfine -w 1 -M 1 'gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx' './target/release/gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx'
Benchmark 1: gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx
  Time (abs ≡):        11.399 s               [User: 70.087 s, System: 2.946 s]

Benchmark 2: ./target/release/gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx
  Time (abs ≡):        27.211 s               [User: 229.958 s, System: 2.932 s]

Summary
  gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx ran
    2.39 times faster than ./target/release/gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx

And here are single runs, in comparison:

❯ cargo run --release --bin gix -- --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx
    Finished `release` profile [optimized] target(s) in 0.67s
     Running `target/release/gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx`
 10:54:09 Hash of index 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx' done 212.8MB in 0.42s (504.3MB/s)
 10:54:09                                           collecting sorted index done 7.6M entries in 0.48s (15.7M entries/s)
 10:54:10                                                          indexing done 7.6M objects in 1.77s (4.3M objects/s)
 10:54:11 Hash of pack 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack' done 1.4GB in 2.73s (498.8MB/s)
 10:54:35                                                         Resolving done 7.6M objects in 24.47s (310.5K objects/s)
 10:54:35                                                          Decoding done 96.0GB in 24.47s (3.9GB/s)

❯ gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx
 10:54:55 Hash of index 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx' done 212.8MB in 0.12s (1.8GB/s)
 10:54:56                                           collecting sorted index done 7.6M entries in 0.48s (15.8M entries/s)
 10:54:56 Hash of pack 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack' done 1.4GB in 0.75s (1.8GB/s)
 10:54:58                                                          indexing done 7.6M objects in 1.90s (4.0M objects/s)
 10:55:05                                                         Resolving done 7.6M objects in 7.69s (988.2K objects/s)
 10:55:05                                                          Decoding done 96.0GB in 7.69s (12.5GB/s)

Judging by a bare hash-object call to Git it's so ridiculously slow that I have a feeling the implementation of hash-object is ~~broken~~ not a good way to evaluate the actual hashing performance.

❯ time git hash-object -t blob ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
f9e03f3b35b34bb0c992414a94e7e14e1a30e63e
git hash-object -t blob   28.65s user 0.35s system 99% cpu 29.054 total

gitoxide ( push-qvyqmopsoltr) [$?] took 29s
❯ l ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
.rw-r--r--@ 1.3Gi byron staff 12 Aug  2020 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack

Because this speed would put it at about ~50MB/s which is nonsense.

I will take a closer look in the coming days.

emilazy · 2025-03-31T18:51:54Z

This is now passing CI.

The performance impact is definitely sad, though hopefully the performance of most operations are less directly bound by SHA‐1 hashing speed. As I detailed in the commit message, I believe that significant gains could be made with a SIMD‐accelerated implementation, possibly even using the SHA instruction set extensions for some of the computation where available. I’m no SIMD expert, but may be increasingly on the path to nerd‐sniping myself into trying to implement that, especially if this impacts Jujutsu performance too much…

Byron · 2025-03-31T23:24:55Z

Thanks so much!

I’m no SIMD expert, but may be increasingly on the path to nerd‐sniping myself into trying to implement that, especially if this impacts Jujutsu performance too much…

I don't think it will except for when cloning repositories, assuming gitoxide is used for that at all. Of course, creating a bunch of objects now is slower as well, but the time for that is probably still dominated by using DEFLATE for compression.

But there is an elephant in the room which has been camouflaged quite . git2will verify each loaded object to assure its hash actually matches what's advertised. Thus, if you had a shattered object already, it would not let you have it. Also, bitrot won't be an issue there (if it can be an issue at all given that everything is INFLATEd before).
However, gitoxide doesn't currently do that and has no option to turn it on - git2 will allow to turn it off though.

Now that I mentioned this I would think a viable next step is to make verification at least optional, and once that is enabled, then SHA1 will be in the way of every decoded object as well. Would that impact performance? Probably not much but it certainly will be measurable (ein t hours reads all commits for analysis, 145k/s is my baseline that surely would go down a bit then), but we get there when we get there.

emilazy · 2025-03-31T23:30:47Z

I don't think it will except for when cloning repositories, assuming gitoxide is used for that at all.

Right. We shell out to git(1) for fetches and pushes for maximum compatibility currently; previously we were using git2. So no impact there. I was more thinking about snapshotting large repositories or doing large rebases (or both, if you edit a commit some way down the tree), since those could potentially create a bunch of objects. But it’s probably not too significant compared to other overhead. Still, if I end up cooking up something with SIMD and it seems to bring back some of the performance I’ll let you know :)

(Does git(1) do that verification? git2 fetch and push performance was always very bad for us, and I wonder if that’s part of why.)

Byron · 2025-03-31T23:47:32Z

Yes, I will be looking forward to that SIMD implementation of a collision-proof SHA1 :).

❯ hyperfine 'sha256 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack' 'sha1 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack'
Benchmark 1: sha256 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
  Time (mean ± σ):     810.4 ms ±   0.6 ms    [User: 613.6 ms, System: 195.6 ms]
  Range (min … max):   809.3 ms … 811.2 ms    10 runs

Benchmark 2: sha1 ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack
  Time (mean ± σ):     835.5 ms ±   4.1 ms    [User: 638.5 ms, System: 195.7 ms]
  Range (min … max):   832.5 ms … 846.7 ms    10 runs

(Does git(1) do that verification? git2 fetch and push performance was always very bad for us, and I wonder if that’s part of why.)

No, it does not.

❯ cp .git/objects/02/bfae32bb6647df0f892205edf660a2dffb421a .git/objects/02/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

git ( master) [?] via 🐍
❯ git cat-file -p 02aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
tree d124b9de1a1b1a243f85312003b7fd3d2ee7bfe6
parent a3998d426d4868eb6d49d443ad4298c0f16c2ab4
author Sebastian Thiel <[email protected]> 1691561336 +0200
committer Sebastian Thiel <[email protected]> 1691561336 +0200

refresh fix-git-mv-existing-dir-non%

Probably this is delegated to git fsck which is expected to run from time to time.

Regarding git2, you can probably experiment with some options:

Byron

I took a closer look now, while still focussing on the bigger picture, commit by commit, and liked it very much.

The only reason for requesting changes is splitting commits, and that I am happy to do for you if the instructions I left in the PR description aren't clear or you think I can probably do it faster. Since this touches on retaining authorship, but creating new commits which would force me to create messages 'under your name', I thought I shouldn't just do it though 😁.

Looking at the last commit I thought for a moment that maybe it's not worth splitting commits because ultimately everything gets touched in one commit anyway (and it's good to do that with one commit message to spread the news into changelogs of many crates), but then again I think it should be done just to get better changelog message.
Again, it's kind of a detail and I am happy to do it.

Once that is decided I think I can finish the review and make remaining changes myself on top of the existing commits.

Assorted notes

verify(checksum) is gold
the performance tests are great, and I don't feel too bad about it
overall I am glad this is done now

Fix [GHSA-2frx-2596-x5r6]. [GHSA-2frx-2596-x5r6]: GHSA-2frx-2596-x5r6 This uses the `sha1-checked` crate from the RustCrypto project. It’s a pure Rust implementation, with no SIMD or assembly code. The hashing implementation moves to `gix-hash`, as it no longer depends on any feature configuration. I wasn’t sure the ideal crate to put this in, but after checking reverse dependencies on crates.io, it seems like there’s essentially no user of `gix-hash` that wouldn’t be pulling in a hashing implementation anyway, so I think this is a fine and logical place for it to be. A fallible API seems better than killing the process as Git does, since we’re in a library context and it would be bad if you could perform denial‐of‐service attacks on a server by sending it hash collisions. (Although there are probably cheaper ways to mount a denial‐of‐service attack.) The new API also returns an `ObjectId` rather than `[u8; 20]`; the vast majority of `Hasher::digest()` users immediately convert the result to `ObjectId`, so this will help eliminate a lot of cruft across the tree. `ObjectId` also has nicer `Debug` and `Display` instances than `[u8; 20]`, and should theoretically make supporting the hash function transition easier, although I suspect further API changes will be required for that anyway. I wasn’t sure whether this would be a good change, as not every digest identifies an entry in the Git object database, but even many of the existing uses for non‐object digests across the tree used the `ObjectId` API anyway. Perhaps it would be best to have a separate non‐alias `Digest` type that `ObjectId` wraps, but this seems like the pragmatic choice for now that sticks with current practice. The old API remains in this commit, as well as a temporary non‐fallible but `ObjectId`‐returning `Hasher::finalize()`, pending the migration of all in‐tree callers. I named the module `gix_hash::hasher` since `gix_hash::hash` seemed like it would be confusing. This does mean that there is a function and module with the same name, which is permitted but perhaps a little strange. Everything is re‐exported directly other than `gix_features::hash::Write`, which moves along with the I/O convenience functions into a new public submodule and becomes `gix_hash::hasher::io::Write`, as that seems like a clearer name to me, being akin to the `gix_hash::hasher` function but as an `std::io::Write` wrapper. Raw hashing is somewhere around 0.25× to 0.65× the speed of the previous implementation, depending on the feature configuration and whether the CPU supports hardware‐accelerated hashing. (The more portable assembly in `sha1-asm` that doesn’t require the SHA instruction set doesn’t seem to speed things up that much; in fact, `sha1_smol` somehow regularly beats the assembly code used by `sha1` on my i9‐9880H MacBook Pro! Presumably this is why that path was removed in newer versions of the `sha1` crate.) Performance on an end‐to‐end `gix no-repo pack verify` benchmark using pack files from the Linux kernel Git server measures around 0.41× to 0.44× compared to the base commit on an M2 Max and a Ryzen 7 5800X, both of which have hardware instructions for SHA‐1 acceleration that the previous implementation uses but this one does not. On the i9‐9880H, it’s around 0.58× to 0.60× the speed; the slowdown is reduced by the older hardware’s lack of SHA‐1 instructions. The `sha1collisiondetection` crate from the Sequoia PGP project, based on a modified C2Rust translation of the library used by Git, was also considered; although its raw hashing performance seems to measure around 1.12–1.15× the speed of `sha1-checked` on x86, it’s indistinguishable from noise on the end‐to‐end benchmark, and on an M2 Max `sha1-checked` is consistently around 1.03× the speed of `sha1collisiondetection` on that benchmark. The `sha1collisiondetection` crate has also had a soundness issue in the past due to the automatic C translation, whereas `sha1-checked` has only one trivial `unsafe` block. On the other hand, `sha1collisiondetection` is used by both Sequoia itself and the `gitoid` crate, whereas rPGP is the only major user of `sha1-checked`. I don’t think there’s a clear winner here. The performance regression is very unfortunate, but the [SHAttered] attack demonstrated a collision back in 2017, and the 2020 [SHA‐1 is a Shambles] attack demonstrated a practical chosen‐prefix collision that broke the use of SHA‐1 in OpenPGP, costing $75k to perform, with an estimate of $45k to replicate at the time of publication and $11k for a classical collision. [SHAttered]: https://shattered.io/ [SHA‐1 is a Shambles]: https://sha-mbles.github.io/ Given the increase in GPU performance and production since then, that puts the Git object format squarely at risk. Git mitigated this attack in 2017; the algorithm is fairly general and detects all the existing public collisions. My understanding is that an entirely new cryptanalytic approach would be required to develop a collision attack for SHA‐1 that would not be detected with very high probability. I believe that the speed penalty could be mitigated, although not fully eliminated, by implementing a version of the hardened SHA‐1 function that makes use of SIMD. For instance, the assembly code used by `openssl speed sha1` on my i9‐9880H measures around 830 MiB/s, compared to the winning 580 MiB/s of `sha1_smol`; adding collision detection support to that would surely incur a performance penalty, but it is likely that it could be much more competitive with the performance before this commit than the 310 MiB/s I get with `sha1-checked`. I haven’t been able to find any existing work on this; it seems that more or less everyone just uses the original C library that Git does, presumably because nothing except Git and OpenPGP is still relying on SHA‐1 anyway… The performance will never compete with the >2 GiB/s that can be achieved with the x86 SHA instruction set extension, as the `SHA1RNDS4` instruction sadly runs four rounds at a time while the collision detection algorithm requires checks after every round, but I believe SIMD would still offer a significant improvement, and the AArch64 extension seems like it may be more flexible. I know that these days the Git codebase has an additional faster unsafe API without these checks that it tries to carefully use only for operations that do not depend on hashing results for correctness or safety. I personally believe that’s not a terribly good idea, as it seems easy to misuse in a case where correctness actually does matter, but maybe that’s just my Rust safety bias talking. I think it would be better to focus on improving the performance of the safer algorithm, as I think that many of the operations where the performance penalty is the most painful are dealing with untrusted input anyway. The `Hasher` struct gets a lot bigger; I don’t know if this is an issue or not, but if it is, it could potentially be boxed. Closes: GitoxideLabs#585

The hashing API has moved to `gix_hash::hasher`, and we now use `sha1-checked` unconditionally.

Complete the transition to `ObjectId` returns.

emilazy · 2025-04-02T16:41:44Z

Makes sense about the changelogs. I really hate to break bisection though, so I’ve reorganized the commits here to hopefully achieve the same goal of no inapplicable changelog entries while attempting to keep the tree green on every commit. The final tree is almost identical save a few more touch‐ups I noticed, but the path there is different, and probably better overall, although there was a bit of ugly contortion around avoiding a cyclic dependency when moving the hashing API and a temporary rename for migration purposes; I find myself really wishing that cargo-smart-release let you mark something as breaking for one sub‐tree a commit touches but as something else for others, but maybe this isn’t a desire you run into often. Let me know if you hate it more than breaking bisection and I can swap things around again to be temporarily broken :)

I don’t mind you pushing to the PR if you want, although if you have other feedback I’d be happy to address it myself to get a better understanding of what’s expected for future contributions.

fbstj · 2025-04-02T17:18:14Z

gix-commitgraph/src/file/verify.rs

+    /// Return the actual checksum on success or [`checksum::Error`] if there is a mismatch.
+    pub fn verify_checksum(&self) -> Result<gix_hash::ObjectId, checksum::Error> {
+        // Even though we could use gix_hash::bytes_of_file(…), this would require extending our
+        // Error type to support io::Error. As we only gain progress, there probably isn't much value        // as these files are usually small enough to process them in less than a second, even for the large ones.


missing line break?

Good catch, thanks! Fixed.

This mostly just affects return types – using `git_hash::verify::Error` instead of bespoke duplicated versions thereof, and occasionally returning an `ObjectId` instead of `()` for convenience.

Prepare for hashing becoming fallible.

This does mean a lot of churn across the tree, but the change is usually just an adjustment to variants of an existing error type, so I expect that most downstream users will require little to no adaption for this change.

`compute_stream_hash` is already fallible, so we don’t want to keep the `try_*` prefix on the fallible API.

Trivial rename.

Since the APIs were already adjusted and all callers migrated, we only need to drop the migration shims.

Byron · 2025-04-03T01:48:48Z

Thanks a lot!

Makes sense about the changelogs. I really hate to break bisection though, so I’ve reorganized the commits here to hopefully achieve the same goal of no inapplicable changelog entries while attempting to keep the tree green on every commit. The final tree is almost identical save a few more touch‐ups I noticed, but the path there is different, and probably better overall, although there was a bit of ugly contortion around avoiding a cyclic dependency when moving the hashing API and a temporary rename for migration purposes; I find myself really wishing that cargo-smart-release let you mark something as breaking for one sub‐tree a commit touches but as something else for others, but maybe this isn’t a desire you run into often.

I have many desires when it comes to cargo smart-release, but they are all overruled by not wanting to spend anymore time than I already have in tooling that should be part of the toolchain :D. cargo smart-release was very costly, and now it's time for it to pay off, probably for another 10 years 😅.

Let me know if you hate it more than breaking bisection and I can swap things around again to be temporarily broken :)

I sense some pain in the paragraph above and don't want to inflict more. From a versioning perspective this changes nothing and I think it's fine to just continue as is, focussing on getting this merged today which I think it will.

I don’t mind you pushing to the PR if you want, although if you have other feedback I’d be happy to address it myself to get a better understanding of what’s expected for future contributions.

That's fair - if there should be major changes I will leave them to you, but small ones I will just do myself on top of yours. My feeling is the PR will be merged when these are done.

Previously SHA1 would be hardcoded.

- align `gix-hash` integration tests with 'new' style. - reorganize `hasher` module to avoid duplicate paths to the same types/functions. - use shorter paths to `gix_hash::hasher|io` everywhere.

Byron · 2025-04-03T05:26:59Z

Thanks a million, this is it!

The good times with incredibly high hashing performance are over, at least until SHA256 lands (#281). If that's no motivation that I don't know what is :).

More notes

I know realize the size of the hasher changed from ~100 bytes to ~800 - an 8x increase is surprising. It turns out that the DetectionState is 724 bytes! Not much to do about that, apparently.
I really like the numerous fly-by improvements, and tried to add my own in case of gix-index.

emilazy · 2025-04-03T16:11:14Z

Yeah, we could maybe box the hasher? Not sure if it’d help or hurt performance.

FWIW, exporting gix_hash::hasher::io as gix_hash::io actually wasn’t intended. I think that use gix_hash::hasher; and hasher::io::{Write, Error} make more sense, because they’re specifically for I/O operations that do hashing, and gix_hash::io::Error has no relation to gix_hash::Error, but does relate to gix_hash::hasher::Error. Of course the API design is up to your judgement here, but I would personally recommend making gix_hash::hasher::io the canonical path and not exposing gix_hash::io.

EliahKagan · 2025-04-04T04:18:01Z

I've opened rustsec/advisory-db#2268 to add a RUSTSEC advisory.

(Also, an entry in the GitHub Advisory Database -- i.e. the global version of GHSA-2frx-2596-x5r6 -- will likely be added in the next few days, along with a National Vulnerability Database entry for the associated CVE-2025-31130. These things are taken care of by the GitHub security team.)

EliahKagan · 2025-04-04T06:50:31Z

The vulnerability fixed here has been assigned RUSTSEC-2025-0021.

Due to ABI differences between different 32-bit targets, the `size_of_hasher` test wrongly failed on `i686-pc-windows-msvc`. Although the test case with that name was introduced in GitoxideLabs#1915, the failure is actually long-standing, in that an analogous faiure occurred in the old `size_of_sha1` test that preceded it and on which it is based. That failure only happened when the old `fast-sha1` feature was enabled, and not with the old `rustsha1` feature. It was not detected earlier as that target is irregularly tested, and built with `--no-default-features --features max-pure` more often than other targets due to difficulties building some other non-Rust dependencies for it. Since GitoxideLabs#1915, the failure happens more often, since we now use only one SHA-1 implementation, `sha1-checked`, so the test always fails on `i686-pc-windows-msvc`. This changes the test to use `gix_testtools::size_ok`, which makes a `==` comparison on 64-bit targets but a `<=` comparison on 32-bit targets where there tends to be more variation in data structures' sizes. This is similar to the fixes in GitoxideLabs#1687 (77c3c59, fc13fc3).

Due to ABI differences between different 32-bit targets, the `size_of_hasher` test wrongly failed on `i686-pc-windows-msvc`. Although the test case with that name was introduced in GitoxideLabs#1915, the failure is actually long-standing, in that an analogous faiure occurred in the old `size_of_sha1` test that preceded it and on which it is based. That failure only happened when the old `fast-sha1` feature was enabled, and not with the old `rustsha1` feature. It was not detected earlier as that target is irregularly tested, and built with `--no-default-features --features max-pure` more often than other targets due to difficulties building some other non-Rust dependencies on it (when not cross-compiling). Since GitoxideLabs#1915, the failure is easier to detect, since we now use only one SHA-1 implementation, `sha1-checked`, so the test always fails on `i686-pc-windows-msvc`. This is only a bug in the test itself, not in any of the code under test. This commit changes the test to use `gix_testtools::size_ok`, which makes a `==` comparison on 64-bit targets but a `<=` comparison on 32-bit targets where there tends to be more variation in data structures' sizes. This is similar to some of the size assertion fixes in GitoxideLabs#1687 (77c3c59, fc13fc3).

EliahKagan · 2025-04-08T07:48:42Z

GItHub Advisory Database entry link: CVE-2025-31130

This has been added as one of the references to RUSTSEC-2025-0021.

Although the most important information about the vulnerability for most readers/users is in the advisories, some information is (and more shall soon be) present here beyond what is shown any of these advisories. It may make sense for further references to be added to the RUSTSEC advisory, such as links to this pull request (and possibly to the older issue #585 that this fixed). Please let me know whether you think I should do that (if you have an opinion).

EliahKagan · 2025-04-08T08:06:59Z

The following is the comment discussion from GHSA-2frx-2596-x5r6, which may be of value or interest in the future and which all commenters agreed could be made public. Because the conversation included mention of the downstream effect on Jujutsu, I've waited until after GHSA-794x-2rpg-rfgr was published (and ended up being delayed further myself--thanks to everyone for bearing with the extra delay).

I was mildly surprised to find that the GitHub API provides no mechanism to retrieve the comments from an advisory on a repository (even if one has full access to them). The following is therefore constructed by copy and paste, rather than programmatically. The ordering should be preserved perfectly, of course, but the dates and times are not present, nor are emoji reactions or intervening events other than comments. This was not too cumbersome, and I've tried to check for mistakes and ensure I didn't miss any comments--but if anyone notices anything wrong then please let me know. The links to the original comments will not work except for people who were added as collaborators on the advisory.

Comment 1 by @emilazy:

I recommend mitigating this vulnerability by switching to the sha1-checked crate (configured as shown in the PoC, to match Git: mitigated “safe hashes” disabled and collision results treated as a fatal error) and removing support for sha1-smol and sha1. sha1-checked builds on top of the sha1 crate, but has a performance penalty. I expect it is still faster than sha1-smol, though.

Comment 2 by @emilazy:

By the way, since I stumbled upon this while working on Jujutsu and Jujutsu is an affected downstream user that will need to issue their own release and advisory once it’s fixed, would you be willing to add the Jujutsu security contacts (@martinvonz, @yuja, @torquestomp) to the collaborators here to give them advance notice of this issue? I’ve mentioned in private that I found a gitoxide security issue that affects Jujutsu but haven’t disclosed any details yet.

Comment 3 by @Byron:

Thanks a lot for reporting!

It's a known issue as well, and I conveniently pushed the issue aside when realising that either there were no mitigations, or by now, that the mitigation would incur a serious performance penalty. This will be most noticeable when cloning a repository as each object in the pack will be decompressed and re-hashed to generate the pack index.

However, now that the advisory is here, I think it's time to act as well and am happy about the nudge.

Your help would be appreciated in mitigating the issue using the available crate, and removing implementations that aren't offering it. This will probably lead to portability issues, as anything based on sha1 won't work in all places, which is also why it's not in max-performance-safe. This is probably the biggest issue.

Thinking about future improvements, gitoxide would have to do what Git does and link to the OS crypto libraries which all come with mitigations while being as fast as this can possibly get. When that happens, performance would be competitive again, and the portability should be restored.

For the time being, to not loose the portability, I think we should consider changing the defaults to be safe, but allow an escape hatch to use the slow-but-portable smol implementation.
Alternatively, maybe someone has the resources to add collision detection to the smol implementation.

This is just my first thoughts, everybody is welcome to chime in to develop a path to mitigation which minimises downstream churn and inconvenience. Thank you.

Comment 4 by @emilazy:

Thanks, I see that #585 is already public, so indeed probably no great need for secrecy here. However, SHA-1 collisions are getting cheaper and cheaper to produce and more severe; the Shambles chosen‐prefix OpenPGP collision cost $75k in 2020, and GPUs are certainly more powerful than they were half a decade ago. So I do think this really ought to be taken seriously, especially given that the SHA1DC algorithm can detect all the known collisions with a fairly general technique.

I am happy to put in some work to try and get this fixed. I will give a more detailed reply including benchmarks and toolchain considerations, hopefully tomorrow; there are promising developments on the portability front. Note that platform libraries won’t help here, unfortunately (SHA-1 is a standard function defined for all outputs, so cryptography libraries won’t just replace it with a partial one that rejects some inputs, or one that has non‐standard outputs), and Git deliberately doesn’t use them by default for this reason. Anyway, more to come when it’s not so late :)

Comment 5 by @emilazy:

Alright, that ended up being a very long “tomorrow”, but you know how it goes.

tl;dr: I’ve successfully adapted the codebase to do collision detection; raw hashing performance is between 0.26× and 0.64× the status quo depending on various factors, but hopefully real‐world performance impact is more limited as hashing shouldn’t be the bottleneck for every operation; there should be no portability concerns or native toolchain dependencies; I think there is potential for future performance improvement but it would take a little work.

So, first the good news about your concerns; then the bad news about why the good news doesn’t matter and the performance is sad; then what I’ve done about all of this.

The good news is: although sha1-asm does indeed require a native toolchain to build and not work with MSVC, that’s not really relevant for high performance in practice. At least on x86, the portable non‐SIMD assembly code in that crate results in something that measures slower than sha1_smol on my machine! The big performance boost is provided instead by the SHA instruction set extension, which is implemented directly inside sha1 using intrinsics, with no toolchain requirement. As of the 0.11.0 pre‐releases of the sha1 crate, sha1-asm has been eliminated entirely. So I think there would be no obstacle to standardizing on one library here.

However, the bad news is: although the sha1-checked crate builds on top of sha1, it doesn’t use the accelerated implementation at all, because the x86 SHA instruction set extension runs four rounds at a time, which is incompatible with the collision detection algorithm’s need to check the state after every round. So there’s no high‐performance option here at all; it’s just tragically slow. The Sequoia PGP project’s sha1collisiondetection crate is based on a translation of the C library used by Git, and manages to be a bit faster, but the performance impact is still real and significant.

I believe there is room for improvement here; based on measurements of OpenSSL’s assembly implementation, I think a significant performance boost could be obtained by integrating the collision detection algorithm with an efficient SIMD‐accelerated implementation that doesn’t use the x86 SHA extension. The >2 GiB/s results are probably unattainable, but it seems likely that we could get performance in the ballpark of the status quo on systems without the SHA extension, and optimistically around half the current performance with the fast-sha1 feature enabled and the SHA extension available. (Maybe let’s say 0.8× and 0.4× respectively to give a little margin for pessimism.)

However, I still think this is worth addressing now, despite the performance penalty. Git have used the pure C implementation the Sequoia PGP crate is derived from since 2017, so they aren’t doing any better than us here. At the time, Linus said that he didn’t notice much slow‐down in practice due to factors other than hashing speed dominating most of the time, though based on some of the responses I suspect that it really depends on your workload.

Of course, gitoxide should aspire to be faster than Git. But in keeping with Rust culture, I think we should be as fast as possible while maintaining security and correctness properties, and no faster. We already have known chosen‐prefix collision attacks done on a research budget that broke OpenPGP and could have broken Git too; they will only get cheaper and more practical over time. The motivation and resources to implement a faster version of the collision detection algorithm aren’t likely to appear if we keep ignoring the problem. (I’d be interested in having a crack at it myself, despite my lack of experience with SIMD, but probably can’t justify putting unpaid time into it at present.)

Sadly, OpenSSL and platform APIs don’t help us here; nothing except Git and OpenPGP is really still using SHA‐1 for anything that matters, so desire to implement the non‐standard collision‐detecting variant in common cryptographic APIs has been very limited, and it seems like every available option is directly based on the original C library without significant further optimization work. So even if we were happy to integrate with system libraries or sacrifice portability, there’s not much better we can do without putting in that work to speed up the algorithm.

I’ve put up my work to fix this at https://github.com/GitoxideLabs/gitoxide-ghsa-2frx-2596-x5r6/pull/1. The commits should be well‐factored, with tests hopefully passing after every one, and the commit messages contain even more detail than this comment. We can do code review there if you have comments.

I chose sha1collisiondetection as it’s faster than sha1-checked and already relied upon by Sequoia PGP, though if any further time is put into performance work I would recommend sha1-checked as a base, since the code is much nicer by virtue of not being derived from translated C code. It involves breaking API changes, though I think the practical impact on downstream users should be limited, and if you wanted to strictly avoid them then the security risk could be mitigated in older versions by panicking when a collision is detected rather than handling it gracefully.

Unfortunately GitHub Actions won’t run on temporary private forks, but we can probably move this to a public PR for final checks if you think the code looks good, since the issue is already documented in SHORTCOMINGS.md anyway.

For the record, here’s the amateur benchmark code I used to benchmark the various options:
use std::time::Duration;

use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion, Throughput};
use sha1_checked::{CollisionResult, Digest as _};

#[inline]
fn sha1_smol(input: &[u8]) -> [u8; 20] {
    let mut hasher = sha1_smol::Sha1::new();
    hasher.update(input);
    hasher.digest().bytes()
}

#[inline]
fn sha1(input: &[u8]) -> [u8; 20] {
    let mut hasher = sha1::Sha1::new();
    hasher.update(input);
    hasher.finalize().into()
}

#[inline]
fn sha1_checked(input: &[u8]) -> Result<[u8; 20], [u8; 20]> {
    let mut hasher = sha1_checked::Builder::default().safe_hash(false).build();
    hasher.update(input);
    match hasher.try_finalize() {
        CollisionResult::Ok(digest) => Ok(digest.into()),
        CollisionResult::Mitigated(_) => unreachable!(),
        CollisionResult::Collision(digest) => Err(digest.into()),
    }
}

#[inline]
fn sha1collisiondetection(input: &[u8]) -> Result<[u8; 20], [u8; 20]> {
    let mut hasher = sha1collisiondetection::Builder::default().safe_hash(false).build();
    hasher.update(input);
    let mut digest = Default::default();
    if hasher.finalize_into_dirty_cd(&mut digest).is_ok() {
        Ok(digest.into())
    } else {
        Err(digest.into())
    }
}

fn sha1_crates(c: &mut Criterion) {
    const KIB: usize = 1024;
    const MIB: usize = 1024 * KIB;

    let mut group = c.benchmark_group("SHA-1");
    group.measurement_time(Duration::from_secs(20));
    for size in [
        16, 128, 4 * KIB, 16 * KIB, MIB, 64 * MIB,
    ] {
        let input = vec![123; size];
        group.throughput(Throughput::Bytes(size as u64));
        group.bench_with_input(BenchmarkId::new("sha1_smol", size), &input, |b, input| {
            b.iter(|| sha1_smol(input))
        });
        group.bench_with_input(BenchmarkId::new("sha1", size), &input, |b, input| {
            b.iter(|| sha1(input))
        });
        group.bench_with_input(
            BenchmarkId::new("sha1-checked", size),
            &input,
            |b, input| b.iter(|| sha1_checked(input)),
        );
        group.bench_with_input(
            BenchmarkId::new("sha1collisiondetection", size),
            &input,
            |b, input| b.iter(|| sha1collisiondetection(input)),
        );
    }
    group.finish();
}

criterion_group!(benches, sha1_crates);
criterion_main!(benches);

Comment 6 by @Byron:

Thanks a million for making this happen, and apologies for the late response!

The performance penalty is a pity, but there is no question about what to favour here.
At least on MacOS, somehow, the OS-Git (Apple-Git) has a hash implementation which feels no slower as the sha1-asm version gitoxide has access to. From what you say this should be hard to achieve, so maybe they do something special. To my mind this may mean there is hope improve performance at some point.
Also, having a slow SHA1 implementation may be a motivation to accelerate SHA256 support in gitoxide.

Since the issue is no secret, i.e. it's public information that SHA1 in gitoxide doesn't have collision detection, may I ask you to submit your private fork changes as PR? There we can go over it.

The real SHA1 real-world performance test is what happens right after a clone/fetch: index generation. This can be simulated with this command:
❯ gix --verbose no-repo pack verify ./tests/fixtures/repos/linux.git/objects/pack/pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx
 08:48:33 Hash of index 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.idx' done 212.8MB in 0.12s (1.8GB/s)
 08:48:33                                           collecting sorted index done 7.6M entries in 0.49s (15.6M entries/s)
 08:48:33 Hash of pack 'pack-3ee05b0f4e4c2cb59757c95c68e2d13c0a491289.pack' done 1.4GB in 0.74s (1.8GB/s)
 08:48:35                                                          indexing done 7.6M objects in 1.88s (4.1M objects/s)
 08:48:43                                                         Resolving done 7.6M objects in 7.47s (1.0M objects/s)
 08:48:43                                                          Decoding done 96.0GB in 7.47s (12.9GB/s)
I recommend trying it with one of these super-dense linux kernel packs as they are obtained directly from the linux kernel repository ( git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git) - the GitHub versions are usually far bigger.

This will also test the 'many small objects hashed in short succession' performance.
Overall, this operation is one of the gems in gitoxide as it's just so much faster than what Git can do, and it's majorly constrained by the SHA1 performance.

In any case, I will be looking forward to that PR and we can probably get it merged without much fuzz. The only question for me will be if the choice of sha1-with-collision-detection-crate holds up when tested against the command above.

Comment 7 by @EliahKagan:

@Byron

However, now that the advisory is here, I think it's time to act as well and am happy about the nudge.

Since the issue is no secret, i.e. it's public information that SHA1 in gitoxide doesn't have collision detection, may I ask you to submit your private fork changes as PR? There we can go over it.

I agree with both of these things: the time to fix this is now (since emilazy has written a patch already, but for other reasons too), and also it is fine to move the PR to the public repo where review comments will survive (comments in temporary private fork PRs are deleted with the forks when publishing an advisory) and where CI will run.

However, although I totally agree that it is okay to discuss and work on this openly, I also strongly believe it should still be treated as a gitoxide vulnerability in the ways that increase openness: the draft advisory here should not be abandoned; it should be revised to list affected crates and versions, and a CVE should be requested; this advisory should be published as usual for GHSA advisories, though publishing it could wait until patched crate versions can be listed, unless there are substantial further delays, which I would not expect; and it should be followed up by a corresponding RUSTSEC advisory.

(I suspect you may already intend all this, such that I am "preaching to the choir." If so--if you are already convinced--then you might be less interested in the following details. But they still serve as a retrospective, as well as an acknowledgement of my own role in failing to recognize how the absence of SHA-1 collision detection constituted a vulnerability.)

Ease of computing collisions

Although the absence of SHA-1 collision detection was known, this nonethelses qualifies as a report of a previously unrecognized gitoxide vulnerability, because it establishes something as a vulnerability that we had not been treating as one before. That this is a vulnerability--rather that merely the absence of an optional security feature that would be nice to have--is something we did not recognize until now.

The report here, and commit messages in the patch, articulate risks of failing to protect against SHA-1 collisions that were not previously considered, related to the computational resources and monetary cost required to carry out an attack. As far as I can tell, the issue of the diminishing cost to find collisions was not examined last year in #585. It was also not examined in the more recent discussion at #1566 (review), nor in the brief email conversation that followed it.

Expectations compared to Git - build configuration

In addition, it looks like a misconception about hardened SHA-1 in Git contributed to this not being viewed as a vulnerability before--and it is a misconception that this report corrects. At one point, we had wrongly thought Git did not use hardened SHA-1 when built with default options. #585 (comment) links to a section of the Git Makefile that, at the time it was posted, appears to have been this section at this commit (which seems unchanged in the current version) and, based on it, says:

TLDR: use a fast optimized implementation, unless specific build flags are specified. So my understanding is that they effectively do not use the collision detection code in any of the default builds, which is quite surprising to me. Not sure where else to look for confirmation on that.

The Git Makefile contains self-contradictory comments on this issue. The idea that Git doesn't default to a hardened SHA-1 seems based on these two fragments:
# ==== Default SHA-1 backend ====
#
# If no *_SHA1 backend is picked, the first supported one listed in
# "SHA-1 implementations" will be picked.
# ==== SHA-1 implementations ====
#
# Define OPENSSL_SHA1 to link to the SHA-1 routines from the OpenSSL
# library.
However, after the three *_SHA1 settings are described, it says:
# If don't enable any of the *_SHA1 settings in this section, Git will
# default to its built-in sha1collisiondetection library, which is a
# collision-detecting sha1 This is slower, but may detect attempted
# collision attacks.
This latter claim--that a default build of Git does detect SHA-1 collisions--is the intended and correct meaning. The commit message where the "If no *_SHA1 backend is picked" comment was introduced (git/git@ed605fa) reveals that this comment intended to make an altogether different claim. The relevant part of that message reads:

For the *_SHA1 and *_SHA256 flags we've discussed the various flags, but not the fact that when you define multiple flags we'll pick one.

Which one we pick depends on the order they're listed in the Makefile, which differed from the order we discussed them in this documentation.

Let's be explicit about how we select these, and re-arrange the listings so that they're listed in the priority order we've picked.

In contrast, the commit message for the commit adding the comment that claims SHA-1 collisions are detected by default (git/git@d00fa55) reinforces that this is the intended meaning:

Let's mention the SHAttered attack and more generally why we use the sha1collisiondetection backend by default, and note that for SHA-256 the user should feel free to pick any of the supported backends as far as hashing security is concerned.

The report/advisory here, as well as the commit messages in the patch, get all this right.

Expectations compared to Git - significance of sha1collisiondetection being a submodule

As of #1566 (review), I was aware of the confusing wording in the Git Makefile, but I hadn't looked into the commit history to resolve it. However, if I recall correctly, after that, you had inquired as to whether Git enabled hardened SHA-1 by default, found that it did, and let me know.

At that point, I should have recommended that gitoxide not having it be treated as a vulnerability. But in addition to not checking how practical it was to find collisions, it looks like I had underestimated how strongly Git tends to be built with SHA-1 collision detection enabled. This is because I was under the wrong impression that SHA-1 would not be hardened if the sha1collisiondetection submodule was not checked out. (This false belief of mine can be observed in #1622, for example.)

As documented for the DC_SHA1_* build options, in the part added in git/git@86cfd61, the submodule is not used by default. Rather, the default is to use the implementation in the sha1dc directory. That's available even if the submodule isn't checked out.

Perceived design burden

This is the least significant factor of those that seem to have contributed to not recognizing the lack of SHA-1 collision detection as a vulnerability, and I mostly include it for completeness.

In #585 (comment), you had said it looked like Git generated a different, safe hash value when a collision could occur, rather than simply failing. This then seemed to require a design decision about whether to support alternate hashes, and the possible problem of ensuring that they would always be the same hashes and including test cases to verify that.

This was inferred from the documentation for the SHA1DCSetSafeHash function in sha1dc/sha1.h, which indicates that generating an alternate safe hash for inputs suspected to be part of a collision is done by default. But this is not actually describing the default behavior of Git. Instead, the sha1dc directory in the Git source code is a vendored copy of sha1collisiondetection/lib, and the comment describes that library's own default behavior (at the revision that was copied into Git).

In git/git@c0c2006, Git modified this vendored copy so that SHA1DCInit sets ctx->safe_hash to 0 instead of 1. Subsequently, a Git developer contributed a patch to the upstream sha1collisiondetection library (cr-marcstevens/sha1collisiondetection@b45fcef) whose improvements included the ability to customize the default per-build with the preprocessor symbol SHA1DC_INIT_SAFE_HASH_DEFAULT. The copy vendored in Git was updated to match that version (git/git@a010391). Git's make, cmake, and meson build scripts set it to 0.

The remaining case is when an external sha1collisiondetection library is used. Git has supported this since git/git@3964cbb. Such a library may (and usually would) be built with a default of 1. But Git still changes it back to 0 when initializing it at runtime.

Thus, unless one explicitly opts out of SHA-1 collision protection altogether when building Git, Git never actually computes and uses alternate safe hashes, but instead allows SHA-1 hash computation to fail if the input is suspected to be part of a collision, and treats such failure as an error. The relevant die calls appear to be this call in sha1dc_git.c with a "SHA-1 appears to be part of a collision attack" message, and the five calls in builtin/index-pack.c with a "SHA1 COLLISION FOUND WITH" message.

This is also something the report and patch here get right, in the description in the advisory of how Git behaves, and in making the operation fallible.

Downstream assumptions

Per that comment, the susceptibility to SHA-1 collisions that jj inherits from gitoxide is to be considered a jj vulnerability. This provides two reasons to regard the situation in gitoxide as a vulnerability for which a GHSA advisory, CVE, and RUSTSEC advisory should be issued:

If I understand correctly, it had been assumed in jj that OIDs were computed as securely in gix-* crates as in Git. Although this was contradicted in SHORTCOMINGS.md, it suggests, especially when considered together with #55 (comment), that in practice downstream applications regard the absence of SHA-1 collision protection to be a security vulnerability for their application.

As a practical matter, it is useful to have advisories so that applications (including jj) that issue their own advisories can have an advisory to link to as the root cause (and that readers of downstream advisories can be informed of, to learn more).

Comment 8 by @EliahKagan:

@emilazy

Thank you for reporting this and writing a patch!

I have a few thoughts regarding possible minor revisions to the advisory (though I am not confident that any are preferable to what currently stands), followed by details about a new journey test failure.

Possible revisions

Credit

I don't know if you prefer to be credited as "Reporter" (as currently selected) or "Remediation developer" but I think they are equally correct in this case. No change is needed here; I mention this only in case you prefer the latter.

CVSS

The draft advisory currently indicates a score of CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:C/C:N/I:H/A:N for the vulnerability. I have two questions about this:

Is this really a scope change? OIDs are used in various ways, and if an attacker can substitute a different object from what was intended, then this could affect integrity in ways that might have elevated stakes, since GPG-signed tags or commits could effectively incorporate attacker-controlled data. But I am not sure if this is sufficient to count as a scope change under CVSS 3.1. I am also not sure if this is what you mean.

Is it any easier to compute an input that looks like it participates in a collision than to compute one that is actually part of a collision? If so, this vulnerability might have an availability impact of low rather than none, even separately from the effect on integrity (which would of course still be high). I suspect I am mistaken about this. In particular, I do not actually know that it is any easier to compute inputs that look like they participate in collisions than actual collisions. But I figured I'd mention it just in case.

The cost to find a collision is even lower than suggested

In https://github.com/GitoxideLabs/gitoxide-ghsa-2frx-2596-x5r6/pull/1/commits/4f2661b8cfdbab6af1ba860b10c004cc8d4726fd, you rightly note that:

the SHAttered attack demonstrated a collision back in 2017, and the 2020 SHA‐1 is a Shambles attack demonstrated a practical chosen‐prefix collision that broke the use of SHA‐1 in OpenPGP, costing $75k at the time, with an estimate of $45k at the time of publication. Given the increase in GPU performance and production since then, that puts the Git object format squarely at risk.

This is correct, and that commit message does not need to be changed. But it seems to me that even this may be understating the ease of finding a usable collision.

If I understand correctly, a chosen-prefix collision was far more usable to break OpenPGP than could be achieved with an identical-prefix collision, but even an identical-prefix collision might be usable to cause significant harm in Git repositories. An attacker could (for example) produce two commits, one of which contains harmful code, offer the innocuous commit in a PR/MR, then distribute a copy of the repository with subsequent signed tags or commits that have it in their history, but substitute the harmful commit.

The SHA‐1 is a Shambles website mentions:

As a side note, a classical collision for SHA-1 now costs just about 11k USD.

This is backed up by the following portions of the SHA-1 is a Shambles paper:

We managed to significantly reduce the complexity of collision attacks against SHA-1: on an Nvidia GTX 970, identical-prefix collisions can now be computed with a complexity (expressed in terms of SHA-1 equivalents on this GPU) of $2^{61.2}$ rather than $2^{64.7}$, and chosen-prefix collisions with a complexity of $2^{63.4}$ rather than $2^{67.1}$. When renting cheap GPUs, this translates to a cost of US$ 11k for a collision, and US$ 45k for a chosen-prefix collision, within the means of academic researchers.

Cost analysis. We paid US$ 75.6k for our computation, but the cost could be as low as US$ 50k with currently lower GPU prices and less idle time. With the same methods, computing an identical-prefix SHA-1 collision would cost only about US$ 11k. This is clearly within reach of reasonable attackers.

If an attack would have cost $11 000 in 2020, it is probably much cheaper even than that now. In addition to being a reason I am very happy about this security advisory and patch, I wonder also if this wording of the advisory should somehow be adjusted slightly, to avoid giving the impression that attackers face greater burdens than they do:

Since the SHAttered PDFs are not in a valid format for Git objects, a direct proof‐of‐concept using higher‐level APIs cannot be immediately demonstrated without significant computational resources.

But I have not really thought of a better wording. The best I have thought of is something like this, which I think is not very good:

Since the SHAttered PDFs are not in a valid format for Git objects, a direct proof‐of‐concept using higher‐level APIs is not immediately available.

The problem with that change is that it could be misread to imply that we are planning to produce such a collision.

Local testing

Almost everything passed

I didn't run all the tests that run on CI, but I ran:

cargo nextest run --workspace --no-fail-fast both with and without GIX_TEST_IGNORE_ARCHIVES=1, on Arch Linux x86-64, Windows 10 22H2 x86-64, macOS 15.3.2 arm64, and Ubuntu 22.04 LTS s390x. There were no new failures.

just test on Arch Linux and macOS 15.3.2. One journey test failed. It might be that only the test's expectations require adjustment, but I'm not certain.

The failing journey test

The gix free pack explode journey test fails, in each of the four feature configurations.

Here's the full output of just journey-tests-pure, and here's the full output of just journey-tests-small. The key part, which is identical in both, is:
-----------------------------------------------------
gix free pack explode
-----------------------------------------------------
     [with] the 'explode' sub-command
        [with] no objects directory specified
           [it] explodes the pack successfully and with desired output
           [when] using the --delete-pack flag
              [with] a valid pack
                 [it] explodes the pack successfully and deletes the original pack and index
                 [it] removes the original files

              [with] a pack file that is invalid somewhere
                 [with] and all safety checks
                    [it] does not explode the file at all4,5c4,7
<     0: Index file, pack file or object verification failed
<     1: index checksum mismatch: expected f1cd3cc7bc63a4a2b357a475a58ad49b40355470, got 337fe3b886fc5041a35313887d68feefeae52519
\ No newline at end of file
---
>     0: Object 4c97a057e41159f9767cf8704ed5ae181adf4d8d at offset 12759 could not be decoded
>     1: Failed to decompress pack entry
>     2: Could not decode zip stream, status was 'DecompressError(General { msg: None })'
>     3: deflate decompression error
\ No newline at end of file
Here's the full output of just journey-tests-async, and here's the full output of just journey-tests. The key part, which is identical in both of those but slightly different from the first two shown above, is:
-----------------------------------------------------
gix free pack explode
-----------------------------------------------------
     [with] the 'explode' sub-command
        [with] no objects directory specified
           [it] explodes the pack successfully and with desired output
           [when] using the --delete-pack flag
              [with] a valid pack
                 [it] explodes the pack successfully and deletes the original pack and index
                 [it] removes the original files

              [with] a pack file that is invalid somewhere
                 [with] and all safety checks
                    [it] does not explode the file at all4,5c4,5
<     0: Index file, pack file or object verification failed
<     1: index checksum mismatch: expected f1cd3cc7bc63a4a2b357a475a58ad49b40355470, got 337fe3b886fc5041a35313887d68feefeae52519
\ No newline at end of file
---
>     0: Error verifying object at offset 12759 against checksum in the index file
>     1: Hash should have been 4c97a057e41159f9767cf8704ed5ae181adf4d8d, but was a29ebd0e0fcbcd2a0842dd44cc7c22a90a310a3a
\ No newline at end of file
This is at https://github.com/GitoxideLabs/gitoxide-ghsa-2frx-2596-x5r6/pull/1/commits/4f2661b8cfdbab6af1ba860b10c004cc8d4726fd. The above output is from Arch Linux, since that is most similar to how we run them on CI, but I have verified that the same happens on macOS. I also verified that they do not have any failures in the base commit c8c42b4 on the main branch.

(The journey tests don't work as well on Windows, where they normally fail fast earlier in the process, for reasons not at all related to the patch here, so I didn't run the journey tests on Windows. Likewise, some journey tests on GNU/Linux on s390x have been failing in ways that seem related to #1890, so I didn't do the journey test experiment in s390x either.)

Comment 9 by @emilazy:

(Just wanted to say thanks for all the feedback and I’ll try to have a proper reply and a public PR with more benchmarks up tonight 💜)

Comment 10 by @emilazy:

It’s always nice to meet someone who writes as many paragraphs as I do 😆

I’ll have the public PR up tomorrow, with more benchmarks and tests and a few fixes.

However, although I totally agree that it is okay to discuss and work on this openly, I also strongly believe it should still be treated as a gitoxide vulnerability in the ways that increase openness: the draft advisory here should not be abandoned; it should be revised to list affected crates and versions, and a CVE should be requested; this advisory should be published as usual for GHSA advisories, though publishing it could wait until patched crate versions can be listed, unless there are substantial further delays, which I would not expect; and it should be followed up by a corresponding RUSTSEC advisory.

FWIW, I agree. I expect that RustSec would want to publish something about this even if gitoxide didn’t. Even though the limitation is documented, I didn’t see that documentation despite doing quite a lot of reading of the gitoxide documentation and some patching too; I would have expected it to be more prominently marked. (But of course it probably would have been, if the situation with Git had been better understood, so I’m not casting blame here.)

The advisory text will need modifying to address the fact that the bug is getting fixed and list the applicable versions, of course. I’ll try and amend it appropriately when I put up the PR, but I don’t know what the next version number will be or exactly which crates depend on hashing, so I hope you can help out there :)

By the way, in case it streamlines later publication as a RustSec advisory, I hereby release the current version of the advisory text and all future modifications I make to it under the CC0 1.0 Universal deed.

Per that comment, the susceptibility to SHA-1 collisions that jj inherits from gitoxide is to be considered a jj vulnerability. This provides two reasons to regard the situation in gitoxide as a vulnerability for which a GHSA advisory, CVE, and RUSTSEC advisory should be issued:

So, I can’t speak for the Jujutsu project – I’m an active user and contributor but not one of the core maintainers – but I would guess that if anyone working on Jujutsu had thought about Git’s SHA‐1 collision mitigations they’d have expected them to be in gitoxide too, yeah. The way I stumbled upon this in practice was looking into the max-performance situation in light of #1873, finding that the sha1 crate seemed to no longer depend on a C toolchain for accelerated hashing, and then thinking “wait, but Git uses a collision‐mitigated version of SHA‐1 and these libraries don’t seem to implement it; is it done inside gitoxide somehow?”, checking the code, and going ‼️. My understanding is that Jujutsu will issue an advisory for this once we update to a fixed gix version; the upcoming release is due next week, but a patch release is always an option if the schedules don’t align.

By the way, in terms of this being a breaking change, the good news is that I compiled a working Jujutsu against my patch with no source code changes. So my intuition that most users of the API won’t have to make any substantive changes seems accurate.

Re: everything about Git defaults and the library configuration, it’s very confusing, yes. I doubted myself about what their actual defaults are a few times when investigating this. Frankly, I think it’s unwise of them to offer build flags to “turn off the security”, but it seems like they introduced it as an experimental opt‐in mitigation, and then scrambled to make it the default when SHAttered happened. In practice it seems like distributors don’t mess with the default, thankfully.

the five calls in builtin/index-pack.c with a "SHA1 COLLISION FOUND WITH" message.

FWIW, this seems to be a separate check of whether an object from the network matches the contents of the same hash on disk, rather than the actual collision‐detection algorithm; I don’t know whether gitoxide implements it or not, but it doesn’t seem load‐bearing.

I don't know if you prefer to be credited as "Reporter" (as currently selected) or "Remediation developer" but I think they are equally correct in this case. No change is needed here; I mention this only in case you prefer the latter.

I believe that you can actually add the same person multiple times to credit different roles? But I don’t mind either way :)

Is this really a scope change? OIDs are used in various ways, and if an attacker can substitute a different object from what was intended, then this could affect integrity in ways that might have elevated stakes, since GPG-signed tags or commits could effectively incorporate attacker-controlled data. But I am not sure if this is sufficient to count as a scope change under CVSS 3.1. I am also not sure if this is what you mean.

Frankly, I have no idea! This is my problem with CVSS: it’s very detailed and looks terribly objective, but in practice you can get really strange results (e.g. due to it deliberately ignoring the likelihood of exposure), and because a lot of the criteria are vague and hard to interpret, people tend to start with a vague idea of how bad the resulting score should be, and then fudge the ambiguous factors until it looks right. That’s certainly part of what happened here.

My intent was to capture the fact that Git hashes are sometimes used in fairly load‐bearing ways for security in other systems; as you implied, the security model of signed tags or pinning resources based on commit hash is broken by SHA‐1 collisions. It’s probably not a great idea to rely on those without any defence‐in‐depth, but I expect that it happens, and to the extent that you’re relying on gitoxide to enforce that security boundary, then this vulnerability could lead to compromises of package managers and the like.

The two examples of a scope change that seem closest to this are:

In a distributed environment, a vulnerability in a component providing connectivity, protection, or authentication services to components in a different security authority should be scored as a Scope change if a successful attack impacts these other components. For example, a vulnerability in a component such as a router, firewall, or authentication manager that affects the primary availability of one or more downstream components should be scored as a Scope change. However, if a successful attack either does not affect at all, or causes only negligible impact to components in a different security authority, the vulnerability should be scored as Scope unchanged. For example, a vulnerability in a component designed to be deployed as part of a larger fault-tolerant topology should not be scored with a changed Scope if the fault-tolerance means a successful attack does not affect components in different security authorities. Any effect on additional services provided by the vulnerable component is considered a secondary impact and not a scope change.

A vulnerability in an application that implements its own security authority which allows attackers to affect resources outside its security scope is scored as a Scope change. This assumes the application provides no features for users to access resources governed by a higher-level security authority shared with other components across multiple security scopes (e.g., the resources of the underlying operating system). One example would be a web application that allows users to read and modify web pages and files only under the web application’s installation paths, and provides no feature for users to interact beyond these paths. A vulnerability in this application allowing a malicious user to access operating system files unrelated to this application is considered a Scope change.

But it’s a lot of guesswork, even beyond that metric. For instance:

Is the Attack Vector really Network, or is it Local because “the attacker relies on User Interaction by another person to perform actions required to exploit the vulnerability (e.g., using social engineering techniques to trick a legitimate user into opening a malicious document)”? (For anyone using gitoxide on the server side, it’s presumably Local, at least.)

Perhaps Privileges Required shouldn’t be None, as usually you need some amount of privilege to exchange Git objects? But not necessarily: e.g. a deduplicating backend of a public forge is certainly vulnerable to this.

Maybe there’s a way to compromise Confidentiality with this in some systems? Or Availability partially from shadowing objects?

I dunno. All I know is that 6.8 seems like it’s about the right ballpark :)

Is it any easier to compute an input that looks like it participates in a collision than to compute one that is actually part of a collision? If so, this vulnerability might have an availability impact of low rather than none, even separately from the effect on integrity (which would of course still be high). I suspect I am mistaken about this. In particular, I do not actually know that it is any easier to compute inputs that look like they participate in collisions than actual collisions. But I figured I'd mention it just in case.

I believe that the likelihood of random inputs triggering the collision detection is extremely low; the sha1collisiondetection repository says lower than 2⁻⁹⁰. I assume that the most efficient known way to generate an input that triggers it is to just try to generate a collision. I’m not sure if you are likely to find “near‐misses” that trigger the detection without actually establishing a full collision in the process.

But in any case I don’t think it would affect the score. The availability impact would be introduced by the fix, rather than the vulnerability, right? So it wouldn’t factor into CVSS, but also generally we call turning an integrity compromise into one of availability “good error checking” :)

(Or is the idea that a gitoxide server would happily accept objects that can be used to deny availability to Git clients? I suppose it’s possible, although it seems like you could think of better things to do with the attack funds.)

If I understand correctly, a chosen-prefix collision was far more usable to break OpenPGP than could be achieved with an identical-prefix collision, but even an identical-prefix collision might be usable to cause significant harm in Git repositories. An attacker could (for example) produce two commits, one of which contains harmful code, offer the innocuous commit in a PR/MR, then distribute a copy of the repository with subsequent signed tags or commits that have it in their history, but substitute the harmful commit.

Hmm, it seems I had the wrong idea of what a chosen‐prefix collision is. I assumed it meant picking a single prefix and deriving a suffix that leads to a collision, but apparently it means picking two separate prefixes, as you say. You certainly need a prefix including a Git object header to exploit this, but the SHAttered PDFs already had a small header before the collision. My understanding was that you needed something like what was used to attack OpenPGP to make practical use of this against Git, but if the SHAttered attack could be enough then that does seem pretty bad. I’ll think about how to reword the advisory in light of my newfound uncertainty.

(Thinking about it now, this seems like a pretty silly thing to assume – if the two plaintexts have the same prefix then the SHA‐1 state will be the same after both of them, so unless the initial state is “especially exploitable”, there’s no reason to believe adding a shared prefix would make it harder? But this is way beyond what I’m qualified to speculate about, honestly.)

The problem with that change is that it could be misread to imply that we are planning to produce such a collision.

Hey, if anyone has some free GPUs and wants to get some headlines, I think an SHA‐1 attack against Git would be a great thing to demonstrate; it seems like seeing is believing with these things. Perhaps GitHub would finally implement SHA‐256 support? (I’d love to see SHA‐256 support in gitoxide, for what it’s worth!)

The failing journey test

Sorry about that. I ran all of the justfile unit tests but I couldn’t run the journey tests because of #1854 (comment), which I still don’t really understand. I might have to do some CI whack‐a-mole.

I don’t understand the “deflate decompression error” change and wonder if it might not be an existing problem; I saw CI was failing on PRs recently. The other one makes more sense – the error messages and nesting will indeed change from the generic hash verification API I implemented – though why the hashes involved changed I can’t say. It’s possible I introduced a bug here and I’ll take a closer look at it tomorrow.

Comment 11 by @Byron:

The advisory text will need modifying to address the fact that the bug is getting fixed and list the applicable versions, of course. I’ll try and amend it appropriately when I put up the PR, but I don’t know what the next version number will be or exactly which crates depend on hashing, so I hope you can help out there :)

Once the PR is up the change in gix-hash can be marked with fix!: …, then cargo smart-release will produce all the new version numbers of gix-hash and all dependent crates if they were released.

Re: everything about Git defaults and the library configuration, it’s very confusing, yes. I doubted myself about what their actual defaults are a few times when investigating this. Frankly, I think it’s unwise of them to offer build flags to “turn off the security”, but it seems like they introduced it as an experimental opt‐in mitigation, and then scrambled to make it the default when SHAttered happened. In practice it seems like distributors don’t mess with the default, thankfully.
❯ git --version
git version 2.39.5 (Apple Git-154)

❯ otool -L `which git`
/usr/bin/git:
        /usr/lib/libxcselect.dylib (compatibility version 1.0.0, current version 1.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1351.0.0)
I believe this is an Apple-provided build as it comes with XCode, and I am not at all clear which SHA1 algorithm it uses. It's obviously fast enough to be the unmitigated version, but it's also possible that they link to a hand-optimized mitigated version that is part of MacOS somewhere. This gives me hope that one day gitoxide could use something similar. To find out experimentally, we'd need a shattered Git object which doesn't seem to exist yet.

FWIW, this seems to be a separate check of whether an object from the network matches the contents of the same hash on disk, rather than the actual collision‐detection algorithm; I don’t know whether gitoxide implements it or not, but it doesn’t seem load‐bearing.

I don't think it implements a 'multiple objects in pack hash the same' check. If that would be undetected, an index with multiple consecutive hashes (but different offsets) would be created. In theory, this will still find the 'right' object of which multiple are stored in the pack, unless it's shattered of course. But if it is shattered, the attacker has no advantage by sprinkling in multiple versions of the object, so I'd agree that it's not load bearing.

Comment 12 by @emilazy:

Updated PR hopefully up tonight. Sorry this has dragged out longer than expected!

Comment 13 by @emilazy:

I have the updated PR up at #1915. I’ll try to get the journey tests happy again based on the Ci results and amend the advisory text appropriately within the next day or two, but the code should be ready for review.

Comment 14 by @emilazy:

I’ve updated the advisory text to account for the upcoming fix. I haven’t filled in the projected non‐vulnerable versions since I found cargo-smart-release’s output a bit hard to get a clear list out of; hopefully one of you can handle that :)

Comment 15 by @Byron:

Thanks! So far I was lucky enough to have @EliahKagan manage these. If this isn't possible here I will (try to) do it myself and request a CVE, too.

Comment 16 by @EliahKagan:

I'll add some crates and versions shortly, though they may require revision. I'll also comment again regarding that, and some other stuff, shortly thereafter. (Due to dependency relationships, we might only need a RUSTSEC advisory for gix-hash. But for the advisory here, I think at least gitoxide should be listed here as well as gix-hash. Furthermore, I lean toward going much further: not listing all dependent crates, but listing a number of other crates that engage in a substantial way with the details of SHA-1 use and verification. But if I go overboard then some can be of course removed, and this can probably be figured out the rest of the way soon.)

Comment 17 by @EliahKagan:

@emilazy

Advisory revisions: metadata

I've added some affected crates. I think we should likely add more, but I am not certain. However, I believe the listed crates are enough to allow things to move forward with requesting a CVE, which I recommend be done. (Even though I have write permissions on the gitoxide repository, I think only Byron can request a CVE.)

Sometimes the crates listed in an advisory on GitHub will be the same as those that have RUSTSEC advisories, but not always. It might be that the only crate we will need a RUSTSEC advisory for is the gix-hash crate. That will be the case if all other vulnerable crates require gix-hash and depend on versions of it that the advisory covers. But I think the advisory here can reasonably list more crates.

RUSTSEC advisories are always for individual crates. So if multiple crates need RUSTSEC advisories, they have to be separate RUSTSEC advisories. This is usually avoided when not necessary. In contrast, no such limitation applies to advisories on GitHub, which can list any number of packages as affected by a single advisory.

I believe that you can actually add the same person multiple times to credit different roles? But I don’t mind either way :)

Either that is not supported, or I am not doing it correctly. When I attempt to select you again to add you as "remediation developer" separately from "reporter," it says "Already added":

Advisory revisions: body text

I've made a minor edit to the "Impact" section. This edit is limited to changes that I believe clarify the existing intent. If you think my changes are incorrect or otherwise not preferable, then please re-edit and/or let me know.

I’ve updated the advisory text to account for the upcoming fix.

Sorry about not replying earlier on this. I think the advisory text doesn't really need to describe the problem in past tense, nor otherwise reflect (except possibly in the "Impact" section) that the bug is getting fixed. The metadata showing affected crates and the versions where the bug is fixed should be enough for that. Most security advisories describe affected software as it is, or was, in versions affected by the vulnerability.

But I also think this is mostly subjective and that it is better to go with your preference, if any. Advisories can be, and are, written in various styles. Accordingly, I have not at this time made changes related to that. I do not plan to edit the advisory text myself to change this (unless you ask me to do so).

In the "Summary" section, the phrase "Previous versions of gitoxide" seems misleading to me. Although "gitoxide" has multiple meanings, such as the project as a whole, the only entity called "gitoxide" that has versions is the gitoxide crate. This is not the only crate affected by the vulnerability--some gix-* library crates are also affected. This is arguably minor in that the advisory metadata will list the affected crates--but that is also an argument for not mentioning the concept of versions in the summary at all.

(If going back to having the advisory implicitly describe only affected versions and using the present tense, that would automatically address this by restoring the old wording. But if the rest of the advisory is to remain in the past tense, then this can still be addressed fully, just by removing the "Previous versions of gitoxide" part and having the summary read: "gitoxide used SHA-1 hash implementations without any collision detection, leaving it vulnerable to hash collision attacks.")

This advisory and the PRs

As of this writing, there are two PRs open: the original PR on the temporary private fork here, and the public PR.

If I understand correctly, the public PR fully supersedes the original PR on the temporary private fork here. If that is the case and if there is nothing that needs to be preserved here in the temporary private fork, then the PR on the temporary private fork should be closed. This is because the PR has to be closed for the temporary private fork to be deleted, which occurs as part of the process of publishing this advisory.

Coordinating RUSTSEC advisory

By the way, in case it streamlines later publication as a RustSec advisory, I hereby release the current version of the advisory text and all future modifications I make to it under the CC0 1.0 Universal deed.

Thanks. That is helpful, because the advisory text you have written is substantial and of great value, definitely including for the way that it explains things, so it is a good idea for it to be used as the text of the RUSTSEC advisory as well (or multiple RUSTSEC advisories if having one for gix-hash end up not being enough).

In principle RUSTSEC advisories can be created by importing from the GitHub Advisory Database. But that at best seems to incur a substantial delay, and a RUSTSEC advisory created that way may sometimes be missing some kinds of information that may be useful to include in its metadata. The practice we've been following in gitoxide is to contribute RUSTSEC advisories explicitly rather than relying on them being imported. RUSTSEC advisories other than those imported from GHSA are offered under CC0.

So your releasing the text in this way allows me to create the RUSTSEC advisory (or advisories) with the text by opening a PR on the rustsec/advisory-db repository. I think this should be done at or shortly after this advisory on GitHub is published. Even though we don't need secrecy in this case, waiting to open it will allow the correct crate version with the fix to be specified in its metadata.

I don't know if you prefer to open that PR. If so, then that is definitely fine, and I'd be pleased to help out in any way, such as by reviewing it (though I do not presume that my help would be needed). But if you do not prefer to do so, then I'd be pleased to do it. Please let me know. (If I open the advisory-db PR, then I'll include you in the Co-authored-by: trailer in a commit that adds the material of this advisory's text.)

Messages from Git that mention SHA-1

FWIW, this seems to be a separate check of whether an object from the network matches the contents of the same hash on disk, rather than the actual collision‐detection algorithm; I don’t know whether gitoxide implements it or not, but it doesn’t seem load‐bearing.

Thanks. You're right about those five messages (and this also explains why bugs not related to SHA-1 have been able to produce them in the past). Sorry about the confusion.

CVSS score

This is my problem with CVSS: it’s very detailed and looks terribly objective, but in practice you can get really strange results (e.g. due to it deliberately ignoring the likelihood of exposure)

I think CVSS 4.0, which is also available for advisories, tries to improve on this. But I have no experience using it. In any case I think the ambiguity is not too severe here.

and because a lot of the criteria are vague and hard to interpret, people tend to start with a vague idea of how bad the resulting score should be, and then fudge the ambiguous factors until it looks right.

Yes, I think this is technically discouraged, but very common and it is hard to avoid the pull to do it, and this is something I have done.

My intent was to capture the fact that Git hashes are sometimes used in fairly load‐bearing ways for security in other systems; as you implied, the security model of signed tags or pinning resources based on commit hash is broken by SHA‐1 collisions. It’s probably not a great idea to rely on those without any defence‐in‐depth, but I expect that it happens, and to the extent that you’re relying on gitoxide to enforce that security boundary, then this vulnerability could lead to compromises of package managers and the like.

The two examples of a scope change that seem closest to this are[...]

Yes, this makes sense.

What you are saying also makes me realize I had not recognized the relevance of the history of claims that the use of SHA-1 in Git was not for security. Such claims have been inaccurate or misleading, but they reflect a design intent for Git repositories where OIDs have been seen as separate from applications where authenticity is required. This further supports viewing this vulnerability as a scope change.

But it’s a lot of guesswork, even beyond that metric. For instance:

While in general I agree there is a lot of guesswork, and that CVSS is flawed and limited in the ways you have described, in this case it seems to me that most of the other selections have decisively correct choices (and that they are the choices you have selected).

Is the Attack Vector really Network, or is it Local because “the attacker relies on User Interaction by another person to perform actions required to exploit the vulnerability (e.g., using social engineering techniques to trick a legitimate user into opening a malicious document)”?

I think this is Network. This is for two reasons, though they overlap in that they are both based on the idea that the vulnerability has an impact as soon as a loss of integrity occurs, even before any subsequent operation that assumes integrity.

First, although the phrase "bound to the network stack" in the CVSS 3.1 specification is confusing, I think this vulnerability is analogous to a vulnerability in a web browser that is triggered by loading a specially crafted web page, which would be scored as network since the page can be retrieved over a network and no further user interaction beyond the action that initially navigates to it is required.

Whether this analogy holds up depends on whether a SHA-1 collision in objects offered by a remote repository is ever supposed to be caught in a clone (or fetch) operation, including in the checkout done automatically in a clone. If that is expected, and if in the absence of a fix it is not caught, then a loss of integrity has already occurred, even prior to any subsequent operation in which the most harmful potential effects of that loss of integrity could occur.

(By comparison, we scored CVE-2024-35186 as having a "network" attack vector. There, the initial loss of integrity--writing a file to an arbitrary location that the user running the program could access--occurred in the checkout performed as part of cloning. This was even though some of the subsequent bad effects--executing a payload in a .git/hooks or $PATH directory--would usually not occur without subsequent interaction.)

Second, the affected library crates can reasonably be used in a wide variety of applications, and the CVSS score being given here is specifically a CVSS base score. For a library, a CVSS base score encompasses what the vulnerability can reasonably expected to produce in an application that uses the library in a reasonable configuration. Operating on Git repositories is generally expected to be done with data retrieved over a network, sometimes including in ways that are conceptually part of the same operation as receiving those data. In this vulnerability, as in CVE-2024-35186 and CVE-2024-35197, the core vulnerability does not fundamentally require local access or rely on additional steps.

(For anyone using gitoxide on the server side, it’s presumably Local, at least.)

I don't follow that part, unless you mean to say that on the server side it is presumably remote.

Perhaps Privileges Required shouldn’t be None, as usually you need some amount of privilege to exchange Git objects?

I think one can get a relevant effect when cloning repositories or when fetching and checking out branches that offer patches. An attacker who offers their own remote does not need privileges on the client who uses the remote. Likewise, even if a remote that is separately vulnerable to SHA-1 collisions requires that the attacker have an account on it to use it, cloning from that remote with a vulnerable client largely does not entail trusting that remote. (Cloning a repository is intended to be secure so long as one does not then take a separate step to run something from the repository without fully inspecting it and any resources it uses.) These may not the most common scenarios, but I think they are plausible.

But not necessarily: e.g. a deduplicating backend of a public forge is certainly vulnerable to this.

Yes, and I think this is also a sufficient example to motivate scoring it as not requiring privileges.

(Examples of gitoxide vulnerabilities where the privilege required was low rather than none were CVE-2024-40644, where an attacker would have to have an account on the machine, or CVE-2025-22620, where an attacker would have to have an account or separately act through some service that operates with possibly low, but still non-negligible, privileges.)

Maybe there’s a way to compromise Confidentiality with this in some systems?

It seems to me that the confidentiality impact would be subsequent to, and a separate result of, the integrity impact. I think CVSS 3.1 does not usually incorporate that into base scores, and that allowing that to be expressed is one of the goals of CVSS 4.0.

Or Availability partially from shadowing objects?

That seems plausible and it's a good point I didn't think of. But I think it is effectively a kind of loss of availability due to replacing correct data with other data. Most loss of integrity carries the possibility of such a downstream impact. (Though admittedly this is not always the case: for example, the loss of integrity in CVE-2024-43785 is to information in messages produced by the application, typically without even conferring the ability to write to logs, and does not automatically confer the ability to affect persistent data.)

Idea of an availability impact without a collision

Is it any easier to compute an input that looks like it participates in a collision than to compute one that is actually part of a collision? If so, this vulnerability might have an availability impact of low rather than none, even separately from the effect on integrity (which would of course still be high). I suspect I am mistaken about this. In particular, I do not actually know that it is any easier to compute inputs that look like they participate in collisions than actual collisions. But I figured I'd mention it just in case.

I believe that the likelihood of random inputs triggering the collision detection is extremely low; the sha1collisiondetection repository says lower than 2^⁻⁹⁰. I assume that the most efficient known way to generate an input that triggers it is to just try to generate a collision. I’m not sure if you are likely to find “near‐misses” that trigger the detection without actually establishing a full collision in the process.

Yes, I do not mean a brute force search for near-misses. As I understand it, SHA-1 collision detection operates by detecting patterns that are leveraged by the known practical ways of producing a collision. I haven't found evidence that it is significantly easier to generate a single input that shows such a sign than to generate two such inputs that have them and that leverage them to actually achieve a collision, but it seemed intuitive to me that this might be possible.

That intuition is based on very little, though, since I am not knowledgeable about this area. I think an examination of this question might start with this 2008 paper, or by looking at the details of the existing collision detection algorithm. You are definitely under no obligation to do anything like that, and in any case this vulnerability should definitely not wait on such an examination to have the advisory and the fixed crates released.

But in any case I don’t think it would affect the score. The availability impact would be introduced by the fix, rather than the vulnerability, right? So it wouldn’t factor into CVSS, but also generally we call turning an integrity compromise into one of availability “good error checking” :)

Sorry, my description of this idea was not clear at all. The availability impact would be due to interaction with other software, including Git itself and with forges such as GitHub that implement SHA-1 collision detection, that exhibits the correct behavior of rejecting anything that looks like a collision. As one example scenario:

An attacker submits a patch to be considered for inclusion in software. The patch does not contain objects with actual SHA-1 collisions, but does contain an object that other existing implementations reject because it looks like it may be part of a collision.

Using gix and ein, or other software that performs Git repository operations using gitoxide library crates, a maintainer fetches and checks out the patch, reviews it, possibly even adding to it, and decides to merge it. This software should match the behavior of other software in rejecting anything that is unsafe to assume has no SHA-1 collision, but it does not.

When then uploaded to a remote like GitHub, or otherwise operated on with software that checks for collisions, the specially crafted false-positive collision is rejected. The availability impact of always and immediately rejecting it would effectively have been zero, but now a substantial annoyance is created.

Like that scenario, all other scenarios I can think of also lead, at most, to substantial annoyance rather than greater harm. Given that I don't even know that making fake collisions is feasible, I don't think we need to worry about that, especially since it looks like your patch might already be ready to merge. But I've described it to clarify what I meant, and in case you end up noticing something feasible about it that I have missed.

(Or is the idea that a gitoxide server would happily accept objects that can be used to deny availability to Git clients? I suppose it’s possible, although it seems like you could think of better things to do with the attack funds.)

Yes, this would only matter, and would only merit specifying an impact on availability, if it is significantly cheaper than finding an actual collision.

Identical-prefix collisions

My understanding was that you needed something like what was used to attack OpenPGP to make practical use of this against Git, but if the SHAttered attack could be enough then that does seem pretty bad. I’ll think about how to reword the advisory in light of my newfound uncertainty.

(Thinking about it now, this seems like a pretty silly thing to assume – if the two plaintexts have the same prefix then the SHA‐1 state will be the same after both of them, so unless the initial state is “especially exploitable”, there’s no reason to believe adding a shared prefix would make it harder? But this is way beyond what I’m qualified to speculate about, honestly.)

Well, I'm not sure it's silly--your understanding might have been right. I might well be mistaken in my suspicion that an identical-prefix collision on a Git object could practically be exploited to do real harm. Subsequent to the blog post that I criticized above, Torvalds made a stronger argument in this mailing list post that the kind of collision produced in SHAttered would be harder to make work against Git objects because Git objects are less flexible due to their headers specifying type and size. This is inapplicable in the setting of a chosen-prefix collision (which was not produced until years after that post), but it plausibly constrains the use of identical-prefix collisions. However:

He presented this as a mitigating factor when taken together with further validation. I think Git ended up adding collision detection rather than (other) further validation, though I am not certain. However, even if it did, does gitoxide, and any programs that use gix-* crates, implement such validation?

That seems to me like this narrows the scenarios where an attack can be carried out using only an identical-prefix collision, and increases the ingenuity required to produce a useful collision. But that does not necessarily make it computationally harder.

I also worry about how much protection the distinction really confers. It seems to me that the idea that the Git object format confers protection is related to the idea expressed in the earlier blog post that repositories usually contain source code and that it's hard to hide malware in source code. Whether or not that is true of source code, there are important use cases for committing binary files as part of test suites. Such binary test data was leveraged in the infamous xz-utils backdoor attack. The gitoxide project itself carries pre-generated repositories in .tar files as test fixtures, which unpack to local repositories used in tests, which are trusted when running the tests (at least in the safe.directory sense). Even if reasonable when articulated, it's not clear to me that the mitigating factors against exploiting an identical-prefix collision on a Git object have survived the test of time.

I might get greater clarity on the reasoning and its merits by reading the full Git mailing list discussion from that time. I think the wording in the advisory about the impact is good, allowing readers to evaluate the risks adequately. However, if you judge that it would be highly valuable for me to look into this further, please let me know and I will do so.

Comment 18 by @EliahKagan:

@Byron

So far I was lucky enough to have @EliahKagan manage these.

I've at least gotten started. I think you should look over the crates and versions and also let me know if you think it is reasonable to add more crates (or if you think some I have added should be removed).

In addition, as usual, when the fixed crates are released, or when their exact versions are otherwise known for sure, I recommend changing the lower bounds to be < than them, rather than <= than the current latest affected versions (or asking me to make that change).

[...] and request a CVE, too.

As noted above, it looks like I cannot myself request the CVE. If the system gave me that capability, I would expect to see the button for it somewhere in this part of the interface:

Comment 19 by @emilazy:

I guess the reason to not just list every crate depending on gix-hash is that it previously didn’t contain the hashing code? I think every crate that currently transitively depends on gix-features with the SHA‐1 feature enabled would be a sensible choice, though.

(In other words the RUSTSEC advisory should probably be for gix-features, though that’s kind of weird since it’s an internal crate, but I guess automated scanning tools and the advisory text will make it clear what needs doing?)

Comment 20 by @emilazy:

I’ve moved this back to being present tense, since I agree that ultimately makes more sense. I’ve also closed the private fork PR.

BTW, it looks like @Byron could give you permissions to handle more of the advisory stuff.

Either that is not supported, or I am not doing it correctly. When I attempt to select you again to add you as "remediation developer" separately from "reporter," it says "Already added":

Yeah, I guess I was misremembering. The current state is fine by me.

So your releasing the text in this way allows me to create the RUSTSEC advisory (or advisories) with the text by opening a PR on the rustsec/advisory-db repository. I think this should be done at or shortly after this advisory on GitHub is published. Even though we don't need secrecy in this case, waiting to open it will allow the correct crate version with the fix to be specified in its metadata.

I don't know if you prefer to open that PR. If so, then that is definitely fine, and I'd be pleased to help out in any way, such as by reviewing it (though I do not presume that my help would be needed). But if you do not prefer to do so, then I'd be pleased to do it. Please let me know. (If I open the advisory-db PR, then I'll include you in the Co-authored-by: trailer in a commit that adds the material of this advisory's text.)

If you don’t mind I’m happy to let you handle this :) I can take a look at the PR of course, but I have no experience with RustSec advisories so I trust you to do a more efficient job than I would.

What you are saying also makes me realize I had not recognized the relevance of the history of claims that the use of SHA-1 in Git was not for security. Such claims have been inaccurate or misleading, but they reflect a design intent for Git repositories where OIDs have been seen as separate from applications where authenticity is required. This further supports viewing this vulnerability as a scope change.

Right. I don’t find those claims too compelling. In practice VCS repositories are upstream of so much that our trust models depend on that I think it’s hard to avoid ever relying on the content‐addressed store to actually address content at any level of the system (which in this case is really an entire ecosystem rather than any one particular system). Signed tags based on unmodified SHA‐1 are worth very little at this point.

(That said, wouldn’t those claims mitigate the claim of a scope change, if taken seriously? In other words, the central question seems to be whether Git counts as a kind of authentication layer that other systems can rely on.)

I don't follow that part, unless you mean to say that on the server side it is presumably remote.

Yes, my mistake. I agree with the rest of what you said about the score.

Yes, I do not mean a brute force search for near-misses. As I understand it, SHA-1 collision detection operates by detecting patterns that are leveraged by the known practical ways of producing a collision. I haven't found evidence that it is significantly easier to generate a single input that shows such a sign than to generate two such inputs that have them and that leverage them to actually achieve a collision, but it seemed intuitive to me that this might be possible.

Right. It seems like an interesting question in principle, but in practice I feel like the cost of the time to have someone qualified to look into this and productionize such an attack might exceed the cost of just generating a collision these days, so it’s probably not worth encoding.

Subsequent to the blog post that I criticized above, Torvalds made a stronger argument in this mailing list post that the kind of collision produced in SHAttered would be harder to make work against Git objects because Git objects are less flexible due to their headers specifying type and size. This is inapplicable in the setting of a chosen-prefix collision (which was not produced until years after that post), but it plausibly constrains the use of identical-prefix collisions.

Yes, I also don’t really buy this, for reasons like the xz attack you mentioned. It might constrain its applicability to specific situations, but ultimately an attacker only has to find one viable niche across the entire ecosystem (or rather, can choose between the available niches and potentially make multiple attempts). Binary files in repositories are common, and I imagine that the use of git fsck type checks is not so universal that you can’t smuggle some data into an object in many cases. The Shambles messages are 640 bytes each (and they’re valid OpenPGP keys); I don’t think you need much malleability to get away with it. Arguments about prepending type and length seem to be irrelevant to an attack against the Git object format directly (the existing collisions are all the same size as their colliding twins anyway and maybe they have to be?), though they certainly saved Git from falling over immediately when colliding files are checked in, as happened with Subversion. It certainly seems true that a chosen‐prefix collision makes an attack lot easier though.

I personally think that there’s no need to look into this further, but I’m biased against these types of arguments in general, where the focus is on “debunking” one possible attacker story rather than eliminating an entire class of vulnerabilities. Realistically assessing the impact of a vulnerability is good, but the temptation to do motivated reasoning is pretty strong when you picked SHA‐1 after it was already considered academically broken in the first case. Ultimately the hardened hash function is a simple enough mitigation here that I think there’s no point in most people spending time deciding how bad it is rather than just bumping and getting on with their day.

Thanks for taking this seriously and working with me to get it resolved. I hope we can archive these comments on the PR or something after this is published, since I think this has been a valuable discussion and I believe that only advisory collaborators can see them even post‐publication :)

Comment 21 by @Byron:

BTW, it looks like @Byron could give you permissions to handle more of the advisory stuff.

A great idea, I didn't know, but set it up now, and invite @EliahKagan to try out the new powers. Thanks as well for adding reasonably high-level crates to the list of vulnerable ones. From there, CI/auditing should manage to distribute the warning throughout the ecosystem.

Comment 22 was posted by the github-staff bot about how CVE-2025-31130 was assigned.

Comment 23 by @EliahKagan:

Which crates to list

I guess the reason to not just list every crate depending on gix-hash is that it previously didn’t contain the hashing code?

I don't actually know what crates are best to list in this advisory as affected. But gix-hash not previously containing the hashing code is not the reason I haven't, so far, listed all its dependencies as affected.

In addition to gix-hash and gix-features, I listed crates that seem like they contained substantive code changes to take advantage of the new check, beyond just forwarding errors. I acknowledge that this is not necessarily a good approach.

I also listed gitoxide, because users of the gix and ein binaries from gitoxide might otherwise be unaware that they should upgrade. I did not list gitoxide-core, even though gitoxide depends on gitoxide-core which is what uses the affected code, because users of gitoxide-core (if there are any other than gitoxide) would, as with other library crates, presumably find out if they are vulnerable through their other dependencies.

In the presence of numerous closely interoperating crates, there is some ambiguity as to what it means for a crate to be affected by a vulnerability. It is not obvious that if a crate X is affected by a vulnerability and a crate Y depends on it, then Y is affected by the vulnerability. If software in a form where its code can actually be run--for example, an executable--that uses Y is vulnerable because it uses X, it should in principle be sufficient to say that the software is vulnerable due to the presence of the code in X. (One clear exception to this is when Y contains a separate vendored copy of X.)

I don't know if this is the best way to think of it. Does this fail to identify vulnerabilities via cargo deny or Dependabot in projects where no Cargo.lock is committed? Even if so, is that something I should be concerned about?

I think every crate that currently transitively depends on gix-features with the SHA‐1 feature enabled would be a sensible choice, though.

Even considering the above, this does seem like it has some merit and may be better than the slapdash approach currently taken. Those are, after all, the crates where, if a project's Cargo.toml lists any of them, then the project is probably affected by the vulnerability.

I am in no way attached to keeping the specific selection of crates currently listed as affected in this advisory (except that gix-features should definitely be kept; I am moderately inclined to keep gitoxide; and no crates other than those developed outside this project should be listed, though they may warrant their own separate advisories that reference this one).

I wanted to select crates to move things forward, particularly in requesting a CVE, as well as in the hope that a conversation like this might occur, so that gaps in my own knowledge and judgment of what should be listed could be addressed. I'd be happy to remove or add crates. In particular, if you feel the best approach is to list all transitive dependencies of gix-features--or, if I can figure it out those that (prior to #1915) depend on it with the fast-sha or rustsha1, just those--then I'd be pleased to change it again. You can also make changes, if you wish.

Other coordination related to publishing advisories

(In other words the RUSTSEC advisory should probably be for gix-features, though that’s kind of weird since it’s an internal crate, but I guess automated scanning tools and the advisory text will make it clear what needs doing?)

Whether or not the crates listed in this advisory are to be changed, I agree that gix-features should probably be what has the RUSTSEC advisory, and that my previous idea of having gix-hash have the RUSTSEC advisory was mistaken. Thanks for pointing this out!

(The strangeness of having a crate that feels internal be the one with the RUSTSEC advisory might be an issue, but once the time comes to open the advisory-db PR--that is, when the fixed crates are released and this advisory is published--I can note the concern in the PR description in case a reviewer considers it a problem or has an alternative approach to offer.)

I’ve also closed the private fork PR.

Thanks! (Although I have the option to delete the temporary private fork, I think that will happen automatically when this is published so I'll probably just let it be deleted that way.)

BTW, it looks like @Byron could give you permissions to handle more of the advisory stuff.

Thanks. After Byron did that, I was able to request the CVE, which was issued.

If you don’t mind I’m happy to let you handle this :) I can take a look at the PR of course, but I have no experience with RustSec advisories so I trust you to do a more efficient job than I would.

No problem!

Very minor further thoughts related to design history in Git

Right. I don’t find those claims too compelling. In practice VCS repositories are upstream of so much that our trust models depend on that I think it’s hard to avoid ever relying on the content‐addressed store to actually address content at any level of the system (which in this case is really an entire ecosystem rather than any one particular system). Signed tags based on unmodified SHA‐1 are worth very little at this point.

Yes.

(That said, wouldn’t those claims mitigate the claim of a scope change, if taken seriously? In other words, the central question seems to be whether Git counts as a kind of authentication layer that other systems can rely on.)

Maybe, I am not sure.

I think the question is whether Git counts as a kind of authentication layer that other systems do rely on, and that is conceptually separate from those systems. The idea that one may sign commits and tags and rely on them, and that one may pin software using full OIDs whether or not the pointed-to commit is GPG signed, is long-standing and seems part of the design--even as the design used SHA-1 based on the idea that those things were separate. I think a flawed security model is still (in a neutral, descriptive sense) a security model. So if it effectively defines scopes, then a vulnerability in one scope that can be exploited in a way that directly affects another scope is a scope change. But I really don't know how widely this reasoning would be accepted.

Arguments about prepending type and length seem to be irrelevant to an attack against the Git object format directly

I think the idea is that it imposes limitations on what one can manage to jam into the object. But on further reflection, this seems more related to getting things into valid DEFLATE encoding (and doing it so the result is of the header-specified length). I haven't experimented with this, but my guess is that it is not a particularly strong barrier.

Further coordination and publishing these comments

Thanks for taking this seriously and working with me to get it resolved.

No problem! If the public PR is nearing being ready to merge and this advisory metadata, including list of effective crates, are not in a state that you consider ready, then please definitely feel free to ping me to let me know and I'll either try to fix it up efficiently or ask Byron for help with it.

I hope we can archive these comments on the PR or something after this is published, since I think this has been a valuable discussion and I believe that only advisory collaborators can see them even post‐publication :)

Yes, that's correct: only collaborators have access to the advisory's revision history and to the comments.

I agree with making a copy of the comments available to the public. I could collect them, formatted so the separations between them are clear and so their authors are clear, and post them in one long comment on #1915--or, if preferred, on #585, or elsewhere such as in a new discussion post.

I will want to wait on a few things before doing so:

Publication of the advisory. Before that, the comments will be confusing and less useful, and more comments may be posted. (New comments cannot be posted after the advisory is published, other than by the github-staff bot, unless GitHub has changed how that works recently.)

A related jj advisory to be published, and/or updated versions of jj-related crates to be released, or otherwise some public mention of the vulnerability there. jj is of course not the only affected downstream software. But you found this vulnerability while working on jj. More significantly, the impact on jj is discussed in the comments here (along with the anticipation that jj will want to patch and issue and advisory). So I'd like to let that move forward first.

@Byron You have commented here (and your comments are substantive and important). I would guess that you have no objection to my including your comments here in such a post. But just in case, please let me know.

Commit messages and changelog generation

cargo-smart-release generates changelogs in which commit messages are grouped by type, with particular types of conventional commits emphasized. Since we consider the condition this advisory describes to be a bug, it seems to me that the original commit message title "fix!: detect SHA‐1 collision attacks" might have been preferable, from the perspective of changelog entry generation, to the changed title "feat!: detect SHA‐1 collision attacks" (f253f02).

To the best of my knowledge, this issue is minor given that there shall be advisories and a CVE, and it applies only to that one commit message and not any others.

I wonder if a higher-level test is possible even without a new collision

The advisory mentions this, which I think is correct but I am not totally sure:

Since the SHAttered PDFs are not in a valid format for Git objects, a direct proof‐of‐concept using higher‐level APIs cannot be immediately demonstrated [...]

The part I am wondering about is not the subsequent part about the resources required, which (as discussed above) is sufficiently contextualized by the information in the impact section about the cost of generating collisions. And it is of course true that the SHAttered PDFs are not valid Git objects.

Rather, I am curious if there is some way to demonstrate the absence of presence of SHA-1 collision detection using higher-level APIs on a repository in which the SHAttered PDFs are used as Git objects even though they are not valid Git objects. I do not expect that such an approach could be leveraged in an actual attack, but if feasible then it could be valuable as a demonstration, and as an additional test, not replacing but supplementing the tests already in #1915.

Corrupted but substantially usable Git repositories can be created with arbitrary data as objects. A loose object can be created by compressing arbitrary data with DEFLATE (without gzip headers) and writing it in the appropriate bucket and file under .git/objects. Then, with no further manual modification of the object database or other Git data structures, a tree that incorporates the object can be produced with git hash-object, and a commit that incorporates that tree can be produced with git commit-tree.
#!/usr/bin/env bash
set -euo pipefail

object_data_file="$1"
tree_entry_name="$2"

# Get the OID and ensure its bucket exists.
object_hash="$(sha1sum -- "$object_data_file" | awk '{ print $1 }')"
bucket=".git/objects/${object_hash:0:2}"
mkdir -p -- "$bucket"

# Create the loose object.
<"$object_data_file" python3 -c '
import sys
import zlib
sys.stdout.buffer.write(zlib.compress(sys.stdin.buffer.read()))
' >"$bucket/${object_hash:2}"

# Create a tree that has the object as an entry.
tree_hash="$({
    printf '100644 %s\0' "$tree_entry_name"
    printf '%s' "$object_hash" | xxd -r -p
} | git hash-object -t tree -w --stdin --literally)"

# Create a commit with that tree, and set the current branch to it.
commit_hash="$(git commit-tree -m 'Initial commit' "$tree_hash")"
branch="$(git symbolic-ref --short HEAD)"
git branch -f -- "$branch" "$commit_hash"

# Show what we have. (Can run `git fsck` afterwards to show the corruption.)
set -x
git log
git ls-tree HEAD
The question is then whether any higher-level operation will get far enough to report an SHA-1 collision, rather than first finding that the object is corrupt by not having a valid Git object header and stopping before getting that far.

The latter is what has happened in most experiments I have attempted so far. The exception is experiments with gix verify itself. There, corruption in a single loose object, at least when the object is expected to be a blob and it does not even have a valid Git object header, seems to always give an "Objects were deleted during iteration - try again" error even in the absence of any indication that it occurred due to a data race. I'll try to report that as a bug. But that bug is independent of this vulnerability, and I have checked that it is independent of the changes in #1915.

I'll let you know if I figure out anything that seems like it could be of use.

Comment 24 by @Byron:

@Byron You have commented here (and your comments are substantive and important). I would guess that you have no objection to my including your comments here in such a post. But just in case, please let me know.

No, and in general I treat everything written here as public even though it won't end up being publicly readable by default.

I'll try to report that as a bug. But that bug is independent of this vulnerability, and I have checked that it is independent of the changes in #1915.

Even though gix verify is by no means a gix fsck and more of a demo than anything serious, it's a strange error even for a repository as strange as this.

Comment 25 by @emilazy:

Even considering the above, this does seem like it has some merit and may be better than the slapdash approach currently taken. Those are, after all, the crates where, if a project's Cargo.toml lists any of them, then the project is probably affected by the vulnerability.

Right. I think the two options that make sense are:

just list gix-features and rely on people’s dependency scanning to handle transitive stuff; or

explicitly list all crates downstream of a hashing‐enabled gix-features.

I think I prefer the latter, because it avoids ambiguity about whether a scan complaining about gix-features is a false positive. It also means that we’re reporting every “first‐party” crate that is likely to expose functionality that is vulnerable, which seems like a good standard to me. So I would personally list gix-features, then search for rustsha1 in the previous release to find all crates that enable hashing, list those, and then list their transitive dependencies.

it seems to me that the original commit message title "fix!: detect SHA‐1 collision attacks" might have been preferable, from the perspective of changelog entry generation, to the changed title "feat!: detect SHA‐1 collision attacks"

Yeah, I hemmed and hawed on this, but after the commits were reorganized, that commit ended up introducing a new fallible hashing API directly, so I figured it was leaning more towards the “breaking feature” end of things. But my opinion on conventional commits is somewhat like my opinion on CVSS :)

FWIW, Jujutsu has conveniently(?) had an issue with the release we just cut, so if it’s possible to get the gitoxide crates released and this advisory published in the next day or two that would be great. (But of course we can always just cut a second patch release if not.)

I tried to see if I could get Git to hash non‐object contents like you did, but you got further than me. Does that provide an easy way to check whether a given Git binary has the collision detection enabled?

Comment 26 by @EliahKagan:

Delisting gix-hash

The most important correction, whether we keep what I've got now or change it, is the removal of gix-hash. I was wrong to list gix-hash, and we should not list that as affected in any advisory related to this vulnerability. gix-hash abstracted over the specific implementation, and it only had a dev dependency on gix-features.

(It also had numerous transitive dependencies, a number of which seem like they don't plausibly expose this vulnerability, and a few of which definitely don't, such as gix-tempfile. That depends on gix-hash via gix-fs. The only mention occurrence of gix_fs in the code of gix-tempfile is this use and re-export of gix_fs::dir::create and gix_fs::dir::remove.)

The long list

Right. I think the two options that make sense are:

just list gix-features and rely on people’s dependency scanning to handle transitive stuff; or

explicitly list all crates downstream of a hashing‐enabled gix-features.

By the second option, do you mean listing those in addition to gix-features? From this, I think that's what you mean:

So I would personally list gix-features, then search for rustsha1 in the previous release to find all crates that enable hashing, list those, and then list their transitive dependencies.

The outcome of that is expansive, but perhaps reasonable in the context of this particular vulnerability. Direct non-dev dependencies on gix-features that specify rustsha1 or fast-sha1 are:
gix-commitgraph
gix-index
gix-object
gix-odb
gix-pack
Of these, gix-index, gix-odb, and gix-pack depend on gix-object. So if somehow it could be justified to not consider gix-features itself to be vulnerable, a modified approach like this could still be followed. We would need two RUSTSEC advisories: one for gix-object and another for gix-commitgraph. I don't know if it's really reasonable to consider gix-features not to be vulnerable, though.

Anyway, taking those and adding their transitive non-dev dependencies, as well as gix-features, gives:
gitoxide
gitoxide-core
gix
gix-archive
gix-blame
gix-commitgraph
gix-config
gix-diff
gix-dir
gix-discover
gix-features
gix-filter
gix-fsck
gix-index
gix-merge
gix-negotiate
gix-object
gix-odb
gix-pack
gix-protocol
gix-ref
gix-revision
gix-revwalk
gix-status
gix-traverse
gix-worktree
gix-worktree-state
To move things closer to the finish line, I've gone ahead and made that change.

Benefits and alternatives

I think I prefer the latter, because it avoids ambiguity about whether a scan complaining about gix-features is a false positive.

This may be avoided when the scan is done by something like Dependabot (when used for alerts or security updates). But if the RUSTSEC advisory is for gix-features, then there would still be that ambiguity when using tools such as cargo audit and cargo deny, if I understand correctly the kind of ambiguity you are talking about.

It also means that we’re reporting every “first‐party” crate that is likely to expose functionality that is vulnerable, which seems like a good standard to me.

I would be reluctant to adopt that as a general standard. Often there are one or more specific crates that are responsible for guaranteeing something or avoiding something, yet fail, causing a vulnerability. When that happens, I think there are accuracy and clarity reasons to prefer listing those crates and few if any others.

However, here we don't have that. It's not obvious what crates should be considered responsible for resisting OID collision attacks. But it is clear that the weakness of not resisting them is a vulnerability.

Another possibility is to go with what I had before, with the correction of removing gix-hash, in that it is no longer quite as subjective and arbitrary in view of the key significance of those five packages. That would be:
gitoxide
gix-commitgraph
gix-index
gix-object
gix-odb
gix-pack
There, the justification for including gitoxide but not intermediate dependencies is as in that comment, and the other crates are the ones that depend directly on gix-features with SHA-1 related features.

Anyway, I am okay with the current long listing. If you and @Byron are also both okay with all that, then let's go with this. (Otherwise, let's pick some other option.)

Checking my work

Just in case I am misunderstanding something (or made some other mistake), this subsection describes how I got the long listing. First:
git switch -d v0.41.0
git grep -Fwn -e fast-sha1 -e rustsha1 -- gix-*/Cargo.toml
That showed:
gix-commitgraph/Cargo.toml:23:gix-features = { version = "^0.40.0", path = "../gix-features", features = ["rustsha1"] }
gix-features/Cargo.toml:85:## Takes precedence over `rustsha1` if both are specified.
gix-features/Cargo.toml:86:fast-sha1 = ["dep:sha1"]
gix-features/Cargo.toml:88:rustsha1 = ["dep:sha1_smol"]
gix-features/Cargo.toml:99:required-features = ["rustsha1"]
gix-features/Cargo.toml:104:required-features = ["parallel", "rustsha1"]
gix-features/Cargo.toml:109:required-features = ["parallel", "rustsha1"]
gix-features/Cargo.toml:114:required-features = ["rustsha1"]
gix-features/Cargo.toml:135:# hashing and 'fast-sha1' feature
gix-hash/Cargo.toml:31:gix-features = { path = "../gix-features", features = ["rustsha1"] }
gix-index/Cargo.toml:26:    "rustsha1",
gix-object/Cargo.toml:45:    "rustsha1",
gix-odb/Cargo.toml:23:gix-features = { version = "^0.40.0", path = "../gix-features", features = ["rustsha1", "walkdir", "zlib", "crc32"] }
gix-pack/Cargo.toml:37:gix-features = { version = "^0.40.0", path = "../gix-features", features = ["crc32", "rustsha1", "progress", "zlib"] }
This suggests gix-commitgraph, gix-hash, gix-index, gix-object, gix-odb, and gix-pack as implicated. But while gix-hash depends on gix-features and specifies rustsha1, it is actually only a dev dependency, used in gix-hash's tests:

gitoxide/gix-hash/Cargo.toml

Lines 29 to 31 in dea106a

[dev-dependencies]

gix-testtools = { path = "../tests/tools" }

gix-features = { path = "../gix-features", features = ["rustsha1"] }

In contrast, for the others, these are in the [dependencies] section of Cargo.toml:

gitoxide/gix-commitgraph/Cargo.toml

Line 23 in dea106a

gix-features = { version = "^0.40.0", path = "../gix-features", features = ["rustsha1"] }

gitoxide/gix-index/Cargo.toml

Lines 25 to 28 in dea106a

gix-features = { version = "^0.40.0", path = "../gix-features", features = [

"rustsha1",

"progress",

] }

gitoxide/gix-object/Cargo.toml

Lines 44 to 47 in dea106a

gix-features = { version = "^0.40.0", path = "../gix-features", features = [

"rustsha1",

"progress",

] }

gitoxide/gix-odb/Cargo.toml

Line 23 in dea106a

gix-features = { version = "^0.40.0", path = "../gix-features", features = ["rustsha1", "walkdir", "zlib", "crc32"] }

gitoxide/gix-pack/Cargo.toml

Line 37 in dea106a

gix-features = { version = "^0.40.0", path = "../gix-features", features = ["crc32", "rustsha1", "progress", "zlib"] }

I tried for a bit to find approaches that didn't require parsing while still being sufficiently flexible. There probably are such approaches, but I didn't find them. I ran some commands such as cargo tree -e no-dev -i gix-commitgraph to make sure I the kind of information it gave was relevant and to see how to parse it. Then:
(
    set -eo pipefail
    (
        for package in gix-commitgraph gix-index gix-object gix-odb gix-pack; do
            cargo tree -e no-dev -i "$package"
        done
    ) | grep -oP '├── \K[\w-]+' | sort -u
)
(The outer ( ) subshell is so it can be run in an interactive shell without set applying to subsequent interaction.)

That gives:
gitoxide
gitoxide-core
gix
gix-archive
gix-blame
gix-config
gix-diff
gix-dir
gix-discover
gix-filter
gix-fsck
gix-index
gix-merge
gix-negotiate
gix-odb
gix-pack
gix-protocol
gix-ref
gix-revision
gix-revwalk
gix-status
gix-traverse
gix-worktree
gix-worktree-state
Combining the two lists and adding gix-features gives the list shown above and now listed as affected in this advisory.

Other metadata change

While making those other changes, I've gone ahead and given myself analyst credit. I hope that's okay. I'd be happy to have that removed or changed if there is any concern about it.

Conventional commits

Yeah, I hemmed and hawed on this, but after the commits were reorganized, that commit ended up introducing a new fallible hashing API directly, so I figured it was leaning more towards the “breaking feature” end of things. [...]

Seems fine. It's not wrong, and the advisories and CVE will make clear that the change fixes a defect.

Releasing

FWIW, Jujutsu has conveniently(?) had an issue with the release we just cut, so if it’s possible to get the gitoxide crates released and this advisory published in the next day or two that would be great.

With #1915 having been merged, and with the advisory and its metadata having been further polished, I don't know of any remaining blocker for a release, unless further adjustments such as along the lines of #1915 (comment) need to be made.

The only other potentially valuable refinements I am aware of are those that affect the tests, but the existing tests are already robust enough to have justified merging the PR and to do a release.

@Byron How do you feel about doing a release soon? You can then fix up crate versions and bounds in this advisory and publish it, or ask me to do that. (Unless you prefer to do it, I suggest pinging me to ask me to do it--but then if I don't reply soon, doing it yourself so that there isn't too big of a gap in time between the crate releases and the advisory.)

I know there is some other stuff in progress, such as #1917, but that seems to involve quite a bit of subtlety and shouldn't be rushed. I'm hoping we could have a release soon for this, and perhaps then still do another release around the 22nd or whenever appropriate.

Possible future (non-blocking) refinements

The further non-blocking posible refinements I alluded to above are:

Maybe an extra test using a SHAttered PDF as though it were a Git object, if I can figure out some way to get a collision detection error rather than an error about the more obvious corruption. I think this is very much nonessential, and possibly not feasible (see below).

Due to ABI differences between different 32-bit targets, the size_of_hasher test wrongly fails on i686-pc-windows-msvc. I can fix that while keeping the test sufficiently robust; I will probably open a PR to change the assertion to use size_ok.

Even if this test bug were new, it need not block anything. But it is not new: it affected that target before when fast-sha1 was enabled, but not with rustsha1. It was not detected before because we don't regularly test that target and because I have often used --no-default-features --features max-pure when doing so, due to difficulties building some other non-Rust dependencies on it.

Testing with corrupted objects

I tried to see if I could get Git to hash non‐object contents like you did, but you got further than me. Does that provide an easy way to check whether a given Git binary has the collision detection enabled?

I don't know if this is feasible. I will look into it further. So far, both with Git and with gitoxide, other errors always occur first, in the tests I've done.

Other ways to inspect git

For finding out if a git binary has collision detection enabled, I think other techniques might be feasible, such as examining the binary, or even opening it in a debugger (which might provide sufficient introspection even in the usual case that it is a release build, especially if external symbols can be loaded).

For Apple Git, which came up above in that comment, based on this post I think the modified source code is in jeremyhu/git, though I don't know if that supplies enough information to figure out the build configuration used in the binaries distributed in Xcode or the Apple developer tools.

Comment 27 by @emilazy:

By the second option, do you mean listing those in addition to gix-features?

Right.

But if the RUSTSEC advisory is for gix-features, then there would still be that ambiguity when using tools such as cargo audit and cargo deny, if I understand correctly the kind of ambiguity you are talking about.

I think it’s precisely in the cases where you have such tooling that listing “just” the root is probably okay, since it’s going to be checking the transitive dependencies.

I would be reluctant to adopt that as a general standard. Often there are one or more specific crates that are responsible for guaranteeing something or avoiding something, yet fail, causing a vulnerability. When that happens, I think there are accuracy and clarity reasons to prefer listing those crates and few if any others.

I can understand that perspective from the gitoxide side. However, in my opinion, most people don’t really care about the internal code organization or what crate is “responsible” for the vulnerability. They most likely use gitoxide via the gix crate, or they use one or two specific crates relevant to their specific task at hand. They don’t think about how it uses gix-features, or any other crate, internally to accomplish a specific goal; they just signed up to get packfile support, and it doesn’t matter to them that packfile verification has a vulnerability because of some other gitoxide crate. The main question is likely to be “wait, I use gix-* crates; am I affected?”, and listing them like you now have here makes it easy to answer that question: if you see one of those versions you have a problem. The scope naturally ends at the gitoxide project, because it’s the one issuing the advisory and these crates are the ones it’s responsible for. My Jujutsu change will bump the gix dependency; the advisory now helps people see that this addresses the issue, where before they’d have to check the dependency tree.

Of course, that doesn’t help when people use some other library that itself uses gitoxide. And people should adopt automated scanning tools to address that kind of risk. Still, in some ways that’s an argument for libraries to issue their own advisories when there’s a non‐obvious way a vulnerability in a dependency leaks through. I do agree that this results in a sort of cascading CVE effect where bugs in a dependency could result in advisories downstream. I’m not sure that’s a bad thing, though, whenever there’s additional nuance provided by the later layers. But in this case we’re just talking about a single CVE and a list of crates in it, and I think there’s little downside to being expansive.

While making those other changes, I've gone ahead and given myself analyst credit. I hope that's okay. I'd be happy to have that removed or changed if there is any concern about it.

More than deserved! Thanks for all you’ve done here and I hope it hasn’t taken up too much of your time.

Comment 28 by @Byron:

Thanks everyone, I think this is a wrap!

I am also glad that Eliah could credit himself, hoping that he will keep doing that.

It's very interesting to have finally seen https://github.com/jeremyhu/git and perusing the added patches didn't reveal a any intentional change to the hashing algorithm, but I didn't look at the actual patches either.

Besides that, I plan to make a release today, and will leave it to @EliahKagan to handle this advisory, including the publishing.

Comment 29 by @Byron:

A new release is now available on crates.io: #1919

Comment 30 by @EliahKagan:

Thanks--I'll look over and fix up the listed versions and publish this advisory shortly.

Comment 31 by @emilazy:

Jujutsu PR is up at jj-vcs/jj#6238. Thanks again!

Comment 32 by @EliahKagan:

Thanks to everyone!!

I've checked over the versions to ensure they match the versions actually released, and edited them just to put them in the better form (with < lower bounds). I'll publish this advisory now.

More than deserved! Thanks for all you’ve done here

Thanks!

and I hope it hasn’t taken up too much of your time.

Quite the opposite: this has been an excellent use of my time.

And thanks for making gitoxide more secure!

emilazy force-pushed the push-qvyqmopsoltr branch 3 times, most recently from 47ad297 to cd6674e Compare March 31, 2025 01:48

emilazy force-pushed the push-qvyqmopsoltr branch 6 times, most recently from c8f2692 to 34e72be Compare March 31, 2025 14:26

Byron requested changes Apr 1, 2025

View reviewed changes

emilazy force-pushed the push-qvyqmopsoltr branch from 34e72be to bac5c78 Compare April 2, 2025 16:36

emilazy added 11 commits April 2, 2025 17:37

add more tests for SHA‐1 vectors

25ff995

factor out private gix_object::object_hasher helper

cf261d5

use gix_object::compute_hash in a test and benchmark

7805ffe

migrate all hashing API users to gix_hash

baa1430

change!: move hashing API to gix_hash

e4439aa

change!: drop obsolete SHA‐1 features

fd12ef8

The hashing API has moved to `gix_hash::hasher`, and we now use `sha1-checked` unconditionally.

migrate all hashing API users to gix_hash::Hasher::finalize()

4e935ce

change!: drop gix_hash::{hasher::Digest, Hasher::digest()}

65aab56

Complete the transition to `ObjectId` returns.

change!: split index and pack checksum verification errors

6c02b0a

feat: add common interface for hash verification

b32f1f3

emilazy force-pushed the push-qvyqmopsoltr branch from bac5c78 to fdcc33b Compare April 2, 2025 16:38

emilazy changed the title ~~fix!: detect SHA‐1 collision attacks~~ feat!: detect SHA‐1 collision attacks Apr 2, 2025

fbstj reviewed Apr 2, 2025

View reviewed changes

emilazy added 9 commits April 2, 2025 21:15

change!: adjust hash verification return types for the common interface

54e5764

This mostly just affects return types – using `git_hash::verify::Error` instead of bespoke duplicated versions thereof, and occasionally returning an `ObjectId` instead of `()` for convenience.

change!: use separate error types for the eol and ident filters

6d5c896

Prepare for hashing becoming fallible.

change!: use separate error type for I/O hashing operations

4f2b649

Prepare for hashing becoming fallible.

change!: adjust error return types to handle collision detection

5095f44

This does mean a lot of churn across the tree, but the change is usually just an adjustment to variants of an existing error type, so I expect that most downstream users will require little to no adaption for this change.

add fallible gix_object::try_compute_hash for migration

f2b07c0

migrate hashing API users to fallible versions

fbf6cc8

change!: make gix_object::compute_hash fallible

b5ac93a

`compute_stream_hash` is already fallible, so we don’t want to keep the `try_*` prefix on the fallible API.

migrate gix_object::{try_ =>}compute_hash users

3d7e379

Trivial rename.

change!: drop migration shims for fallible hashing

a68f115

Since the APIs were already adjusted and all callers migrated, we only need to drop the migration shims.

emilazy force-pushed the push-qvyqmopsoltr branch from fdcc33b to a68f115 Compare April 2, 2025 20:15

Byron added 2 commits April 3, 2025 13:06

Prepare gix-index end-of-index extension parsing for SHA256.

f879654

Previously SHA1 would be hardcoded.

refactor

4501086

- align `gix-hash` integration tests with 'new' style. - reorganize `hasher` module to avoid duplicate paths to the same types/functions. - use shorter paths to `gix_hash::hasher|io` everywhere.

Byron enabled auto-merge April 3, 2025 05:27

Byron merged commit 4660f7a into GitoxideLabs:main Apr 3, 2025
21 checks passed

emilazy deleted the push-qvyqmopsoltr branch April 3, 2025 16:11

EliahKagan mentioned this pull request Apr 5, 2025

Test data-structure sizes in 32-bit Windows on CI, and fix size_of_hasher #1932

Merged

feat!: detect SHA‐1 collision attacks #1915

feat!: detect SHA‐1 collision attacks #1915

Conversation

emilazy commented Mar 31, 2025 • edited Loading

Tasks for @emilazy

Byron's Tasks

Byron commented Mar 31, 2025

emilazy commented Mar 31, 2025

Byron commented Mar 31, 2025

emilazy commented Mar 31, 2025

Byron commented Mar 31, 2025

Byron left a comment

Choose a reason for hiding this comment

Assorted notes

emilazy commented Apr 2, 2025

fbstj Apr 2, 2025

Choose a reason for hiding this comment

emilazy Apr 2, 2025

Choose a reason for hiding this comment

Byron commented Apr 3, 2025

Byron commented Apr 3, 2025

More notes

emilazy commented Apr 3, 2025

EliahKagan commented Apr 4, 2025

EliahKagan commented Apr 4, 2025

EliahKagan commented Apr 8, 2025 • edited Loading

EliahKagan commented Apr 8, 2025 • edited Loading

Ease of computing collisions

Expectations compared to Git - build configuration

Expectations compared to Git - significance of sha1collisiondetection being a submodule

Perceived design burden

Downstream assumptions

Possible revisions

Credit

CVSS

The cost to find a collision is even lower than suggested

Local testing

Almost everything passed

The failing journey test

Advisory revisions: metadata

Advisory revisions: body text

This advisory and the PRs

Coordinating RUSTSEC advisory

Messages from Git that mention SHA-1

CVSS score

Idea of an availability impact without a collision

Identical-prefix collisions

Which crates to list

Other coordination related to publishing advisories

Very minor further thoughts related to design history in Git

Further coordination and publishing these comments

Commit messages and changelog generation

I wonder if a higher-level test is possible even without a new collision

Delisting gix-hash

The long list

Benefits and alternatives

Checking my work

Other metadata change

Conventional commits

Releasing

Possible future (non-blocking) refinements

Testing with corrupted objects

Other ways to inspect git

emilazy commented Mar 31, 2025 •

edited

Loading

EliahKagan commented Apr 8, 2025 •

edited

Loading

EliahKagan commented Apr 8, 2025 •

edited

Loading

Delisting `gix-hash`

Other ways to inspect `git`