Skip to content

Commit f253f02

Browse files
committed
feat!: detect SHA‐1 collision attacks
Fix [GHSA-2frx-2596-x5r6]. [GHSA-2frx-2596-x5r6]: GHSA-2frx-2596-x5r6 This uses the `sha1-checked` crate from the RustCrypto project. It’s a pure Rust implementation, with no SIMD or assembly code. The hashing implementation moves to `gix-hash`, as it no longer depends on any feature configuration. I wasn’t sure the ideal crate to put this in, but after checking reverse dependencies on crates.io, it seems like there’s essentially no user of `gix-hash` that wouldn’t be pulling in a hashing implementation anyway, so I think this is a fine and logical place for it to be. A fallible API seems better than killing the process as Git does, since we’re in a library context and it would be bad if you could perform denial‐of‐service attacks on a server by sending it hash collisions. (Although there are probably cheaper ways to mount a denial‐of‐service attack.) The new API also returns an `ObjectId` rather than `[u8; 20]`; the vast majority of `Hasher::digest()` users immediately convert the result to `ObjectId`, so this will help eliminate a lot of cruft across the tree. `ObjectId` also has nicer `Debug` and `Display` instances than `[u8; 20]`, and should theoretically make supporting the hash function transition easier, although I suspect further API changes will be required for that anyway. I wasn’t sure whether this would be a good change, as not every digest identifies an entry in the Git object database, but even many of the existing uses for non‐object digests across the tree used the `ObjectId` API anyway. Perhaps it would be best to have a separate non‐alias `Digest` type that `ObjectId` wraps, but this seems like the pragmatic choice for now that sticks with current practice. The old API remains in this commit, as well as a temporary non‐fallible but `ObjectId`‐returning `Hasher::finalize()`, pending the migration of all in‐tree callers. I named the module `gix_hash::hasher` since `gix_hash::hash` seemed like it would be confusing. This does mean that there is a function and module with the same name, which is permitted but perhaps a little strange. Everything is re‐exported directly other than `gix_features::hash::Write`, which moves along with the I/O convenience functions into a new public submodule and becomes `gix_hash::hasher::io::Write`, as that seems like a clearer name to me, being akin to the `gix_hash::hasher` function but as an `std::io::Write` wrapper. Raw hashing is somewhere around 0.25× to 0.65× the speed of the previous implementation, depending on the feature configuration and whether the CPU supports hardware‐accelerated hashing. (The more portable assembly in `sha1-asm` that doesn’t require the SHA instruction set doesn’t seem to speed things up that much; in fact, `sha1_smol` somehow regularly beats the assembly code used by `sha1` on my i9‐9880H MacBook Pro! Presumably this is why that path was removed in newer versions of the `sha1` crate.) Performance on an end‐to‐end `gix no-repo pack verify` benchmark using pack files from the Linux kernel Git server measures around 0.41× to 0.44× compared to the base commit on an M2 Max and a Ryzen 7 5800X, both of which have hardware instructions for SHA‐1 acceleration that the previous implementation uses but this one does not. On the i9‐9880H, it’s around 0.58× to 0.60× the speed; the slowdown is reduced by the older hardware’s lack of SHA‐1 instructions. The `sha1collisiondetection` crate from the Sequoia PGP project, based on a modified C2Rust translation of the library used by Git, was also considered; although its raw hashing performance seems to measure around 1.12–1.15× the speed of `sha1-checked` on x86, it’s indistinguishable from noise on the end‐to‐end benchmark, and on an M2 Max `sha1-checked` is consistently around 1.03× the speed of `sha1collisiondetection` on that benchmark. The `sha1collisiondetection` crate has also had a soundness issue in the past due to the automatic C translation, whereas `sha1-checked` has only one trivial `unsafe` block. On the other hand, `sha1collisiondetection` is used by both Sequoia itself and the `gitoid` crate, whereas rPGP is the only major user of `sha1-checked`. I don’t think there’s a clear winner here. The performance regression is very unfortunate, but the [SHAttered] attack demonstrated a collision back in 2017, and the 2020 [SHA‐1 is a Shambles] attack demonstrated a practical chosen‐prefix collision that broke the use of SHA‐1 in OpenPGP, costing $75k to perform, with an estimate of $45k to replicate at the time of publication and $11k for a classical collision. [SHAttered]: https://shattered.io/ [SHA‐1 is a Shambles]: https://sha-mbles.github.io/ Given the increase in GPU performance and production since then, that puts the Git object format squarely at risk. Git mitigated this attack in 2017; the algorithm is fairly general and detects all the existing public collisions. My understanding is that an entirely new cryptanalytic approach would be required to develop a collision attack for SHA‐1 that would not be detected with very high probability. I believe that the speed penalty could be mitigated, although not fully eliminated, by implementing a version of the hardened SHA‐1 function that makes use of SIMD. For instance, the assembly code used by `openssl speed sha1` on my i9‐9880H measures around 830 MiB/s, compared to the winning 580 MiB/s of `sha1_smol`; adding collision detection support to that would surely incur a performance penalty, but it is likely that it could be much more competitive with the performance before this commit than the 310 MiB/s I get with `sha1-checked`. I haven’t been able to find any existing work on this; it seems that more or less everyone just uses the original C library that Git does, presumably because nothing except Git and OpenPGP is still relying on SHA‐1 anyway… The performance will never compete with the >2 GiB/s that can be achieved with the x86 SHA instruction set extension, as the `SHA1RNDS4` instruction sadly runs four rounds at a time while the collision detection algorithm requires checks after every round, but I believe SIMD would still offer a significant improvement, and the AArch64 extension seems like it may be more flexible. I know that these days the Git codebase has an additional faster unsafe API without these checks that it tries to carefully use only for operations that do not depend on hashing results for correctness or safety. I personally believe that’s not a terribly good idea, as it seems easy to misuse in a case where correctness actually does matter, but maybe that’s just my Rust safety bias talking. I think it would be better to focus on improving the performance of the safer algorithm, as I think that many of the operations where the performance penalty is the most painful are dealing with untrusted input anyway. The `Hasher` struct gets a lot bigger; I don’t know if this is an issue or not, but if it is, it could potentially be boxed. Closes: #585
1 parent 7805ffe commit f253f02

File tree

13 files changed

+292
-242
lines changed

13 files changed

+292
-242
lines changed

Diff for: Cargo.lock

+7-7
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Diff for: Cargo.toml

+1-2
Original file line numberDiff line numberDiff line change
@@ -205,8 +205,7 @@ gix-hash = { opt-level = 3 }
205205
gix-actor = { opt-level = 3 }
206206
gix-config = { opt-level = 3 }
207207
miniz_oxide = { opt-level = 3 }
208-
sha1 = { opt-level = 3 }
209-
sha1_smol = { opt-level = 3 }
208+
sha1-checked = { opt-level = 3 }
210209

211210
[profile.release]
212211
overflow-checks = false

Diff for: SHORTCOMINGS.md

-4
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,3 @@ This file is for tracking features that are less well implemented or less powerf
3535
* **gix-url** _might_ be more restrictive than what git allows as for the most part, it uses a browser grade URL parser.
3636
* Thus far there is no proof for this, and as _potential remedy_ we could certainly re-implement exactly what git does
3737
to handle its URLs.
38-
39-
### `gix-features`
40-
41-
* **sha1** isn't hardened (i.e. doesn't have collision detection). Needs [to be contributed](https://github.com/GitoxideLabs/gitoxide/issues/585).

Diff for: gix-features/Cargo.toml

+7-26
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ progress = ["prodash"]
2424
## Provide human-readable numbers as well as easier to read byte units for progress bars.
2525
progress-unit-human-numbers = ["prodash?/unit-human"]
2626
## Provide human readable byte units for progress bars.
27-
progress-unit-bytes = ["dep:bytesize", "prodash?/unit-bytes"]
27+
progress-unit-bytes = ["dep:bytesize", "prodash?/unit-bytes", "gix-hash/progress-unit-bytes"]
2828

2929
## Provide utilities suitable for working with the `std::fs::read_dir()`.
3030
fs-read-dir = ["dep:gix-utils"]
@@ -77,40 +77,29 @@ zlib-stock = ["zlib", "flate2?/zlib"]
7777
## may build in environments where other backends don't.
7878
zlib-rust-backend = ["zlib", "flate2?/rust_backend"]
7979

80-
#! ### Mutually Exclusive SHA1
81-
## A fast SHA1 implementation is critical to `gitoxide's` object database performance
82-
## A multi-crate implementation that can use hardware acceleration, thus bearing the potential for up to 2Gb/s throughput on
83-
## CPUs that support it, like AMD Ryzen or Intel Core i3, as well as Apple Silicon like M1.
84-
## Takes precedence over `rustsha1` if both are specified.
85-
fast-sha1 = ["dep:sha1"]
86-
## A standard and well performing pure Rust implementation of Sha1. Will significantly slow down various git operations.
87-
rustsha1 = ["dep:sha1_smol"]
80+
# TODO: Remove these.
81+
fast-sha1 = []
82+
rustsha1 = []
8883

8984
#! ### Other
9085

9186
## Count cache hits and misses and print that debug information on drop.
9287
## Caches implement this by default, which costs nothing unless this feature is enabled
9388
cache-efficiency-debug = []
9489

95-
[[test]]
96-
name = "hash"
97-
path = "tests/hash.rs"
98-
required-features = ["rustsha1"]
99-
10090
[[test]]
10191
name = "parallel"
10292
path = "tests/parallel_threaded.rs"
103-
required-features = ["parallel", "rustsha1"]
93+
required-features = ["parallel"]
10494

10595
[[test]]
10696
name = "multi-threaded"
10797
path = "tests/parallel_shared_threaded.rs"
108-
required-features = ["parallel", "rustsha1"]
98+
required-features = ["parallel"]
10999

110100
[[test]]
111101
name = "single-threaded"
112102
path = "tests/parallel_shared.rs"
113-
required-features = ["rustsha1"]
114103

115104
[[test]]
116105
name = "pipe"
@@ -130,10 +119,8 @@ parking_lot = { version = "0.12.0", default-features = false, optional = true }
130119

131120
walkdir = { version = "2.3.2", optional = true } # used when parallel is off
132121

133-
# hashing and 'fast-sha1' feature
134-
sha1_smol = { version = "1.0.0", optional = true }
122+
# hashing
135123
crc32fast = { version = "1.2.1", optional = true }
136-
sha1 = { version = "0.10.0", optional = true }
137124

138125
# progress
139126
prodash = { version = "29.0.1", optional = true }
@@ -156,12 +143,6 @@ libc = { version = "0.2.119" }
156143
[dev-dependencies]
157144
bstr = { version = "1.3.0", default-features = false }
158145

159-
160-
# Assembly doesn't yet compile on MSVC on windows, but does on GNU, see https://github.com/RustCrypto/asm-hashes/issues/17
161-
# At this time, only aarch64, x86 and x86_64 are supported.
162-
[target.'cfg(all(any(target_arch = "aarch64", target_arch = "x86", target_arch = "x86_64"), not(target_os = "windows")))'.dependencies]
163-
sha1 = { version = "0.10.0", optional = true, features = ["asm"] }
164-
165146
[package.metadata.docs.rs]
166147
all-features = true
167148
features = ["document-features"]

Diff for: gix-features/src/hash.rs

+6-177
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,12 @@
11
//! Hash functions and hash utilities
2-
//!
3-
//! With the `fast-sha1` feature, the `Sha1` hash type will use a more elaborate implementation utilizing hardware support
4-
//! in case it is available. Otherwise the `rustsha1` feature should be set. `fast-sha1` will take precedence.
5-
//! Otherwise, a minimal yet performant implementation is used instead for a decent trade-off between compile times and run-time performance.
6-
#[cfg(all(feature = "rustsha1", not(feature = "fast-sha1")))]
7-
mod _impl {
8-
use super::Digest;
9-
10-
/// A implementation of the Sha1 hash, which can be used once.
11-
#[derive(Default, Clone)]
12-
pub struct Sha1(sha1_smol::Sha1);
13-
14-
impl Sha1 {
15-
/// Digest the given `bytes`.
16-
pub fn update(&mut self, bytes: &[u8]) {
17-
self.0.update(bytes);
18-
}
19-
/// Finalize the hash and produce a digest.
20-
pub fn digest(self) -> Digest {
21-
self.0.digest().bytes()
22-
}
23-
}
24-
}
25-
26-
/// A hash-digest produced by a [`Hasher`] hash implementation.
27-
#[cfg(any(feature = "fast-sha1", feature = "rustsha1"))]
28-
pub type Digest = [u8; 20];
29-
30-
#[cfg(feature = "fast-sha1")]
31-
mod _impl {
32-
use sha1::Digest;
33-
34-
/// A implementation of the Sha1 hash, which can be used once.
35-
#[derive(Default, Clone)]
36-
pub struct Sha1(sha1::Sha1);
37-
38-
impl Sha1 {
39-
/// Digest the given `bytes`.
40-
pub fn update(&mut self, bytes: &[u8]) {
41-
self.0.update(bytes);
42-
}
43-
/// Finalize the hash and produce a digest.
44-
pub fn digest(self) -> super::Digest {
45-
self.0.finalize().into()
46-
}
47-
}
48-
}
492
3+
// TODO: Remove this.
504
#[cfg(any(feature = "rustsha1", feature = "fast-sha1"))]
51-
pub use _impl::Sha1 as Hasher;
5+
pub use gix_hash::hasher::{
6+
hasher,
7+
io::{bytes, bytes_of_file, bytes_with_hasher, Write},
8+
Digest, Hasher,
9+
};
5210

5311
/// Compute a CRC32 hash from the given `bytes`, returning the CRC32 hash.
5412
///
@@ -71,132 +29,3 @@ pub fn crc32(bytes: &[u8]) -> u32 {
7129
h.update(bytes);
7230
h.finalize()
7331
}
74-
75-
/// Produce a hasher suitable for the given kind of hash.
76-
#[cfg(any(feature = "rustsha1", feature = "fast-sha1"))]
77-
pub fn hasher(kind: gix_hash::Kind) -> Hasher {
78-
match kind {
79-
gix_hash::Kind::Sha1 => Hasher::default(),
80-
}
81-
}
82-
83-
/// Compute the hash of `kind` for the bytes in the file at `path`, hashing only the first `num_bytes_from_start`
84-
/// while initializing and calling `progress`.
85-
///
86-
/// `num_bytes_from_start` is useful to avoid reading trailing hashes, which are never part of the hash itself,
87-
/// denoting the amount of bytes to hash starting from the beginning of the file.
88-
///
89-
/// # Note
90-
///
91-
/// * Only available with the `gix-object` feature enabled due to usage of the [`gix_hash::Kind`] enum and the
92-
/// [`gix_hash::ObjectId`] return value.
93-
/// * [Interrupts][crate::interrupt] are supported.
94-
#[cfg(all(feature = "progress", any(feature = "rustsha1", feature = "fast-sha1")))]
95-
pub fn bytes_of_file(
96-
path: &std::path::Path,
97-
num_bytes_from_start: u64,
98-
kind: gix_hash::Kind,
99-
progress: &mut dyn crate::progress::Progress,
100-
should_interrupt: &std::sync::atomic::AtomicBool,
101-
) -> std::io::Result<gix_hash::ObjectId> {
102-
bytes(
103-
&mut std::fs::File::open(path)?,
104-
num_bytes_from_start,
105-
kind,
106-
progress,
107-
should_interrupt,
108-
)
109-
}
110-
111-
/// Similar to [`bytes_of_file`], but operates on a stream of bytes.
112-
#[cfg(all(feature = "progress", any(feature = "rustsha1", feature = "fast-sha1")))]
113-
pub fn bytes(
114-
read: &mut dyn std::io::Read,
115-
num_bytes_from_start: u64,
116-
kind: gix_hash::Kind,
117-
progress: &mut dyn crate::progress::Progress,
118-
should_interrupt: &std::sync::atomic::AtomicBool,
119-
) -> std::io::Result<gix_hash::ObjectId> {
120-
bytes_with_hasher(read, num_bytes_from_start, hasher(kind), progress, should_interrupt)
121-
}
122-
123-
/// Similar to [`bytes()`], but takes a `hasher` instead of a hash kind.
124-
#[cfg(all(feature = "progress", any(feature = "rustsha1", feature = "fast-sha1")))]
125-
pub fn bytes_with_hasher(
126-
read: &mut dyn std::io::Read,
127-
num_bytes_from_start: u64,
128-
mut hasher: Hasher,
129-
progress: &mut dyn crate::progress::Progress,
130-
should_interrupt: &std::sync::atomic::AtomicBool,
131-
) -> std::io::Result<gix_hash::ObjectId> {
132-
let start = std::time::Instant::now();
133-
// init progress before the possibility for failure, as convenience in case people want to recover
134-
progress.init(
135-
Some(num_bytes_from_start as prodash::progress::Step),
136-
crate::progress::bytes(),
137-
);
138-
139-
const BUF_SIZE: usize = u16::MAX as usize;
140-
let mut buf = [0u8; BUF_SIZE];
141-
let mut bytes_left = num_bytes_from_start;
142-
143-
while bytes_left > 0 {
144-
let out = &mut buf[..BUF_SIZE.min(bytes_left as usize)];
145-
read.read_exact(out)?;
146-
bytes_left -= out.len() as u64;
147-
progress.inc_by(out.len());
148-
hasher.update(out);
149-
if should_interrupt.load(std::sync::atomic::Ordering::SeqCst) {
150-
return Err(std::io::Error::new(std::io::ErrorKind::Other, "Interrupted"));
151-
}
152-
}
153-
154-
let id = gix_hash::ObjectId::from(hasher.digest());
155-
progress.show_throughput(start);
156-
Ok(id)
157-
}
158-
159-
#[cfg(any(feature = "rustsha1", feature = "fast-sha1"))]
160-
mod write {
161-
use crate::hash::Hasher;
162-
163-
/// A utility to automatically generate a hash while writing into an inner writer.
164-
pub struct Write<T> {
165-
/// The hash implementation.
166-
pub hash: Hasher,
167-
/// The inner writer.
168-
pub inner: T,
169-
}
170-
171-
impl<T> std::io::Write for Write<T>
172-
where
173-
T: std::io::Write,
174-
{
175-
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
176-
let written = self.inner.write(buf)?;
177-
self.hash.update(&buf[..written]);
178-
Ok(written)
179-
}
180-
181-
fn flush(&mut self) -> std::io::Result<()> {
182-
self.inner.flush()
183-
}
184-
}
185-
186-
impl<T> Write<T>
187-
where
188-
T: std::io::Write,
189-
{
190-
/// Create a new hash writer which hashes all bytes written to `inner` with a hash of `kind`.
191-
pub fn new(inner: T, object_hash: gix_hash::Kind) -> Self {
192-
match object_hash {
193-
gix_hash::Kind::Sha1 => Write {
194-
inner,
195-
hash: Hasher::default(),
196-
},
197-
}
198-
}
199-
}
200-
}
201-
#[cfg(any(feature = "rustsha1", feature = "fast-sha1"))]
202-
pub use write::Write;

Diff for: gix-features/tests/hash.rs

-16
This file was deleted.

Diff for: gix-hash/Cargo.toml

+6
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,19 @@ doctest = false
1616
test = false
1717

1818
[features]
19+
# Temporary, to avoid a circular dependency on `gix-features`.
20+
progress-unit-bytes = ["prodash/unit-bytes"]
1921
## Data structures implement `serde::Serialize` and `serde::Deserialize`.
2022
serde = ["dep:serde"]
2123

2224
[dependencies]
25+
# Temporary, to avoid a circular dependency on `gix-features`.
26+
prodash = "29.0.1"
27+
2328
thiserror = "2.0.0"
2429
faster-hex = { version = "0.9.0" }
2530
serde = { version = "1.0.114", optional = true, default-features = false, features = ["derive"] }
31+
sha1-checked = { version = "0.10.0", default-features = false }
2632

2733
document-features = { version = "0.2.0", optional = true }
2834

0 commit comments

Comments
 (0)