Skip to content

Root hash signature verification #527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

Root hash signature verification #527

wants to merge 11 commits into from

Conversation

jleightcap
Copy link
Contributor

@jleightcap jleightcap commented Mar 6, 2023

Closes #248.

@jleightcap
Copy link
Contributor Author

Why have the sigstore_protobuf_specs.dev.sigstore.*.v1 types in addition to some manually-defined... very close types?

What's been throwing me off w.r.t. expanding the VerificationMaterial checkpoint field is that models.py declares VerificationMaterials (with an s) that we seem to actually operate over in verify.

@woodruffw
Copy link
Member

Why have the sigstore_protobuf_specs.dev.sigstore.*.v1 types in addition to some manually-defined... very close types?

The types under sigstore_protobuf_specs are only meant for client interoperation -- they're pure models, without any associated functionality. We marshal from them early on in sigstore-python's lifecycle and into less flexible models that reflect the subset of Sigstore functionality we actually intend to support 🙂

@woodruffw
Copy link
Member

What's been throwing me off w.r.t. expanding the VerificationMaterial checkpoint field is that models.py declares VerificationMaterials (with an s) that we seem to actually operate over in verify.

VerificationMaterials and VerificationMaterial are confusingly named, and are conceptually very closely related: the former is our restricted model with verification functionality, and the latter is the interoperation model.

In this context, I think what we need to do is add a new checkpoint attribute to our LogEntry model, pulling it from the bundle's inclusion proof. That checkpoint then needs to be verified per the steps in #248. The relevant site for that is probably VerificationMaterials.from_bundle.

@jleightcap
Copy link
Contributor Author

Ah, OK -- so that extends to a lot of the extremely close types, same with LogInclusionProof which was more obviously useful in leveraging pydantic. My next step then is to make sure I'm threading through the protobuf field correctly...

@jleightcap
Copy link
Contributor Author

OK, this if falling into place now that I actually have a checkpoint around 🙂

More of a nit: would this be appropriate to live in verify_merkle_inclusion along with the other inclusion_proof checks? I'm planning on breaking it out into its own definition.

@woodruffw
Copy link
Member

Ah, OK -- so that extends to a lot of the extremely close types, same with LogInclusionProof which was more obviously useful in leveraging pydantic. My next step then is to make sure I'm threading through the protobuf field correctly...

Yep, exactly; there's a roughly 1:1 correspondence between many sigstore and sigstore_protobuf_specs types, with the former being what we want to marshal into (because we can do stronger correctness checks versus the protobuf definitions).

@jleightcap
Copy link
Contributor Author

I think I see how to translate the Go implementation just based on skimming some Go docs, but I'm hardly proficient at Go. I'll get my best effort up but would appreciate a second glance.

@jleightcap jleightcap force-pushed the jl/248 branch 2 times, most recently from 3e3a2d7 to 5113033 Compare March 20, 2023 21:00
@jleightcap
Copy link
Contributor Author

I'm unsure how to handle the contents of the checkpoint: str. The contents as of the commits on this PR, for example:

rekor.sigstage.dev - 8050909264565447525
5423838
uVcNv0SegKjexoiMoZMYKCAmXh06CP2W0X5dc6yZg/k=
Timestamp: 1679349379012118479

— rekor.sigstage.dev 0y8wozBFAiEA2JO/0FqKe3EGtfLu25UW+HeKrDqxgvofoKxcHZTR920CIHfdD2Am/L2i0Fofpbnl+FifnH9B/t8kJ4nZ1JLqFq65

I've been trying to determine what these strings indicate based on the openapi.yml, but the format: signedCheckpoint clue seems to only lead to the Rekor Go source code.

And in terms of usage, should this string be parsed into a pydantic-verified class or something?

@woodruffw
Copy link
Member

It took a little bit of digging, but it looks like a checkpoint is a SignedNote, which has an unmarshalling implementation defined here:

https://github.com/sigstore/rekor/blob/4b1fa6661cc6dfbc844b4c6ed9b1f44e7c5ae1c0/pkg/util/signed_note.go#L129-L187

And in terms of usage, should this string be parsed into a pydantic-verified class or something?

Yes! This should be one or more Pydantic data models.

@woodruffw
Copy link
Member

Specifically, here's the SignedNote representation:

type SignedNote struct {
	// Textual representation of a note to sign.
	Note string
	// Signatures are one or more signature lines covering the payload
	Signatures []note.Signature
}

and a SignedCheckpoint is:

type SignedCheckpoint struct {
	Checkpoint
	SignedNote
}

where unmarshalling is defined as:

func (r *SignedCheckpoint) UnmarshalText(data []byte) error {
	s := SignedNote{}
	if err := s.UnmarshalText([]byte(data)); err != nil {
		return fmt.Errorf("unmarshalling signed note: %w", err)
	}
	c := Checkpoint{}
	if err := c.UnmarshalCheckpoint([]byte(s.Note)); err != nil {
		return fmt.Errorf("unmarshalling checkpoint: %w", err)
	}
	*r = SignedCheckpoint{Checkpoint: c, SignedNote: s}
	return nil
}

and the underlying Checkpoint type:

type Checkpoint struct {
	// Origin is the unique identifier/version string
	Origin string
	// Size is the number of entries in the log at this checkpoint.
	Size uint64
	// Hash is the hash which commits to the contents of the entire log.
	Hash []byte
	// OtherContent is any additional data to be included in the signed payload; each element is assumed to be one line
	OtherContent []string
}

// UnmarshalText parses the common formatted checkpoint data and stores the result
// in the Checkpoint.
//
// The supplied data is expected to begin with the following 3 lines of text,
// each followed by a newline:
// <ecosystem/version string>
// <decimal representation of log size>
// <base64 representation of root hash>
// <optional non-empty line of other content>...
// <optional non-empty line of other content>...
//
// This will discard any content found after the checkpoint (including signatures)
func (c *Checkpoint) UnmarshalCheckpoint(data []byte) error {
	l := bytes.Split(data, []byte("\n"))
	if len(l) < 4 {
		return errors.New("invalid checkpoint - too few newlines")
	}
	origin := string(l[0])
	if len(origin) == 0 {
		return errors.New("invalid checkpoint - empty ecosystem")
	}
	size, err := strconv.ParseUint(string(l[1]), 10, 64)
	if err != nil {
		return fmt.Errorf("invalid checkpoint - size invalid: %w", err)
	}
	h, err := base64.StdEncoding.DecodeString(string(l[2]))
	if err != nil {
		return fmt.Errorf("invalid checkpoint - invalid hash: %w", err)
	}
	*c = Checkpoint{
		Origin: origin,
		Size:   size,
		Hash:   h,
	}
	if len(l) >= 3 {
		for _, line := range l[3:] {
			if len(line) == 0 {
				break
			}
			c.OtherContent = append(c.OtherContent, string(line))
		}
	}
	return nil
}


signatures: list[Signature] = []

sig_parser = re.compile(r"\u2014 (\S+) (\S+)\n")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if a regex is the right tool here. This is mostly intended to follow the Go implementation, which does something similar with fscanf. It does enforce the line-oriented format more succinctly than doing something like... breaking on spaces and asserting the formatting is correct. However I'm not sure how well this would handle a malformed input.

@jleightcap
Copy link
Contributor Author

For the same example,

rekor.sigstage.dev - 8050909264565447525
5423838
uVcNv0SegKjexoiMoZMYKCAmXh06CP2W0X5dc6yZg/k=
Timestamp: 1679349379012118479

— rekor.sigstage.dev 0y8wozBFAiEA2JO/0FqKe3EGtfLu25UW+HeKrDqxgvofoKxcHZTR920CIHfdD2Am/L2i0Fofpbnl+FifnH9B/t8kJ4nZ1JLqFq65

Is parsed into

signed_note=SignedNote(
  note='rekor.sigstage.dev - 8050909264565447525\n5541353\ngjPqyxuCAsF6q8lBajluLf7bx13CTooqhEjbJaQj9CQ=\nTimestamp: 1679538328208274614\n',
  signatures=[
    Signature(
      name='rekor.sigstage.dev',
      sig_hash=b'\xd3/0\xa3',
      sig_base64=b'MEYCIQCUwXGGOMZandt2PVF5xAld5fU/jeMW3rzdQjjmSAbMtgIhAMouGEObW4MEH2O5IYFtO8qKZHa1bYbWm/nLJiwmOM9n'
    )
  ]
)
checkpoint=Checkpoint(
  origin='rekor.sigstage.dev - 8050909264565447525',
  log_size=5541353,
  log_hash=b'\x823\xea\xcb\x1b\x82\x02\xc1z\xab\xc9Aj9n-\xfe\xdb\xc7]\xc2N\x8a*\x84H\xdb%\xa4#\xf4$',
  other_content=['Timestamp: 1679538328208274614']
)

@jleightcap
Copy link
Contributor Author

What I've been hitting my head against is gettin the inclusion proof's member root hash into the same "format" as the parsed Checkpoint type -- simply; the LogInclusionProof hash is 64-bytes (b'b10cdd64791dea64f78dbbe75b984c19b942dc55bffcb7bb205fe3b29b59db2e') and the SignedNote hash is a 44-character base64-string (sQzdZHkd6mT3jbvnW5hMGblC3FW//Le7IF/jsptZ2y4=)

What I had to learn is that = should be a big flag this is base64-encoded, and should be base64.b64decode(note_hash).hex() 😶‍🌫️

Now everything lines up 🙂

@jleightcap jleightcap force-pushed the jl/248 branch 3 times, most recently from 28d9f75 to 2eb51ca Compare March 28, 2023 14:20
@@ -133,3 +135,33 @@ def verify_merkle_inclusion(entry: LogEntry) -> None:
f"Inclusion proof contains invalid root hash: expected {inclusion_proof}, calculated "
f"{calc_hash}"
)


def verify_checkpoint(client: RekorClient, entry: LogEntry) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifier.py has a block intended for oneline verifications that previously called into the verify_merkle_inclusion here. So, by adding an additional parallel test, this seemed like the natural place for this function to live -- however, it's not particularly "merkle" related.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this should probably live under _internal.rekor.checkpoint.

@jleightcap jleightcap changed the title [WIP] Root Hash Verification Root hash signature verification Mar 28, 2023
@woodruffw woodruffw requested a review from tetsuo-cpp March 29, 2023 02:17
if inclusion_proof.checkpoint is None:
return

# verififcaiton occurs in two stages:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# verififcaiton occurs in two stages:
# verification occurs in two stages:

Comment on lines +147 to +148
if inclusion_proof.checkpoint is None:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be an error case? We probably shouldn't silently accept inclusion proofs that don't contain a checkpoint.

signed_checkpoint = SignedCheckpoint.from_text(inclusion_proof.checkpoint)

# FIXME(jl): produces an invalid signature exception -- am I using the keyring correctly?
# signed_checkpoint.signed_note.verify(client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing it, but where is this verify method defined? I don't see it declared on SignedNote.

Signed-off-by: Jack Leightcap <[email protected]>

TASK(jl): `checkpoint` breakout

Signed-off-by: Jack Leightcap <[email protected]>
@woodruffw
Copy link
Member

Superseded by #634.

@woodruffw woodruffw closed this Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include Root Hash verification when verifying a Rekor Inclusion Proof
2 participants