Skip to content

Avoid reading input files to memory #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jku opened this issue Jul 11, 2022 · 5 comments · Fixed by #329
Closed

Avoid reading input files to memory #158

jku opened this issue Jul 11, 2022 · 5 comments · Fixed by #329
Labels
enhancement New feature or request

Comments

@jku
Copy link
Member

jku commented Jul 11, 2022

The API uses bytes as type for all file contents. This is fine for certs and signatures, but the input files can be arbitrarily large: the API should not force reading input files to memory all at once.

@jku jku added the enhancement New feature or request label Jul 11, 2022
@woodruffw
Copy link
Member

I agree that we should avoid reading inputs fully into memory, although I think we're actually limited by our dependencies in terms of actually doing this. For example, we use cryptography's EllipticCurvePrivateKey.sign, which only takes a bytes as input (not a file-like object).

We could maybe work/hack around that by using mmap and bytearray, but that requires investigation (and a confirmation from upstream that they intend to support non-bytes).

@woodruffw woodruffw added the upstream Requires upstream work or coordination label Jul 11, 2022
@jku
Copy link
Member Author

jku commented Jul 11, 2022

oh I did not realize that -- this is disappointing from cryptography (I suppose the actual process still works with chunks at a time).

If it's an upstream issue then I'm fine with closing this as well: I expected this to be an easy fix not an issue with a dependency.

@woodruffw
Copy link
Member

I'm okay with keeping it open for now! I think we should investigate the feasibility of using bytearray, and I can chat with the cryptography maintainers to see if they have any insights here.

@woodruffw
Copy link
Member

woodruffw commented Dec 3, 2022

One of the cryptography maintainers pointed out to me that ECDSA signing is defined by signing over the digest of the input rather than the full input, and that cryptography's API's actually supports pre-hashing the input and passing that in instead of the full bytes.

As a result, this should actually be possible: we can stream the input into the digest function instead, and then pass Prehashed(SHA256()) during signing instead of having the signing operating do the hash internally!

Relevant APIs: https://cryptography.io/en/latest/hazmat/primitives/asymmetric/utils/#cryptography.hazmat.primitives.asymmetric.utils.Prehashed

@woodruffw woodruffw removed the upstream Requires upstream work or coordination label Dec 3, 2022
woodruffw added a commit that referenced this issue Dec 6, 2022
Closes #158.

Signed-off-by: William Woodruff <[email protected]>
@woodruffw
Copy link
Member

Doing this in #329.

woodruffw added a commit that referenced this issue Dec 7, 2022
* sigstore: stream input into signing

Closes #158.

Signed-off-by: William Woodruff <[email protected]>

* _utils: ignore some mypy errors

See: python/typing#659

Signed-off-by: William Woodruff <[email protected]>

* test_sign: fix signing test

Signed-off-by: William Woodruff <[email protected]>

* test_utils: test correctness of our digest streaming

Signed-off-by: William Woodruff <[email protected]>

* sigstore, test: stream verification as well

Signed-off-by: William Woodruff <[email protected]>

* _utils: document the security properties of sha256_streaming

Signed-off-by: William Woodruff <[email protected]>

Signed-off-by: William Woodruff <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants