-
Notifications
You must be signed in to change notification settings - Fork 27
Allow implementations that return results instead of mutating inputs #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree with this proposal as outlined. We will need to be careful how we describe this in the spec doc, though. [Assuming the intent has not changed] my primary issue with the previous PR where you attempted to do this is that so many words are so heavily overloaded in this space. Talking about "outputs" is hard because that so often means "return values", for example. As long as we can keep the scope of responsibilities clear, I like this. |
Is there a reason the output of OnEncrypt and OnDecrypt have to be the same type as the input? As far as I can tell, we don't want any Keyring implementation to be changing the AlgorithmSuite, replacing the EncryptionContext, and setting the SigningKey and VerificationKey, but by having these methods return EncryptionMaterials and DecryptionMaterials they would have that capability, even if the spec forbids it. Would it make sense to define separate EncryptResult and DecryptResult types that only contain what we expect OnEncrypt and OnDecrypt to produce? |
No. For reasons we should write down somewhere more permanent. The short version is that we have this big pile of values that we think are related and we want to make sure that the relationship intent is explicit at every point in the chain. If you have to explicitly specify the related values every time you (for example) add an EDK, then while you could be lying, you have to express the intent that you want this EDK to be related to specific encryption context, algorithm suite, and plaintext data key. If you were just returning the EDK by itself then the next piece up the chain has to rely on you not getting confused and returning the wrong thing. If you have to express your intent of related values, then the next thing up the chain can at least check to verify that your intent and their intent match. |
Yup, I had exactly the same question as @WesleyRosenblum initially and have only recently started to come around to the fact that it is better to keep the values packaged together, even through it would be more efficient or less confusing to not produce redundant data. I feel it should be possible to have isomorphic implementations that don't literally return all the pieces, but I'd rather tackle that separately if and when we feel it's worth allowing. It may be worth documenting this reasoning right in the specification, or at least linking from there to some other page that does. |
The other part of the problem with something like this is that someone else now needs to be responsible for assembling the stuff that a keyring produces. This also comes back to the intent of the keyring: if I call a keyring and it just throws back some EDKs, I don't know how that keyring intended them to be used or how they should be incorporated into the materials I already have. One of the big reasons for the keyrings using the same structure for input as output was because it forces you to be very explicit about the scope of responsibility of each component. Yes, it can be less efficient if you want to avoid side mutating action, but it reduces complexity and ambiguity enough that it's worth it. |
Given these arguments, is there still a strong preference towards returning results versus mutating the inputs? With the requirement to partially copy the inputs into the outputs, I'm not sure it results in any less error prone or user friendly code than just mutating the inputs, where at least it can be controlled which parameters are mutable and which are not. |
We want our spec to have language that allows both, given that "have an object that you pass around and mutate" doesn't make sense in some languages. As for how we describe behaviors in the spec, I think expressing these behaviors functionally (with the disclaimer that this could also be implemented by mutating objects) is more clear/easier to understand than the other way around. It might depend a bit on the context, but my gut feeling is that it will lead to simpler wording that is easier to check for correctness and completeness. I'm +1 for the suggestion that we describe somewhere what we mean by terms like "input"/"output"/"return". We should express that there is fuzzyness around these terms, and how they might mean different things for different languages. Note that we already have a preamble that discusses what "structure" might mean for different implementations (because C is makes everything difficult), and while this is a different issue, this discussion feels like it belongs alongside that one. |
My question was regarding the implementation and if there was any preference implied by this update to the spec towards immutable objects. Java can support both returning a result or mutating an object, but as I was updating the code to shift from mutable to immutable materials, it felt less user friendly and more error prone. Maybe its not the place of the spec to indicate a preference if either are acceptable. |
That's a good point. I feel like which one is "preferable" ultimately depends on the specific implementation in question. That being said, will wording things functionally lead to developers also developing something functional when that might not be the right solution for them? Perhaps, but the spec will never be able to definitively make implementation choices for developers. In this specific case, I'm not sure what we could state that would be helpful for developers making such a decision. If we can think of any such statement, perhaps the preamble is the place for it. |
Agreed re ergonomics. That said, I think we can find ergonomics for most languages that make sense for either. ex: # mutable
materials.add_edk(new_edk, trace)
# immutable
new_materials = old_materials.with_edk(new_edk, trace) |
Whatever ergonomics we come up with, I want to highlight this: "we want to make sure that the relationship intent is explicit". |
On the point of "hard to mis-use": I wonder if the fact that we have a list of EDKs as both input and output might lead to bugs in user-implemented Keyrings, where they return just the EDK encrypted by this keyring and accidentally drop the list passed in on the floor. Are Keyrings expected or even permitted to change their behaviour based on the input list of EDKs? |
This is exactly why in JS they are in an immutable list. It is impossible to remove EDKs. Edit: Sorry, not really "immutable", and you can add. |
A higher-level thought I had this morning: I assert the specification should specify correct behaviour, and not enforce opinions on how to make implementations easy to use and hard to mis-use. I feel like we're going around in circles trying to figure out THE best way to avoid mistakes in implementation (either of the ESDK itself or things like user-implemented Keyrings), but that's extremely difficult/impossible to do across an unbounded set of target languages with very different concepts and ergonomics. Having mutable materials might be convenient and less error-prone in C, but be a real nightmare in Haskell. If we define a very flexible idea of how an operation produces output/results, we can focus on what values these operations MUST produce, and in the preamble of the spec RECOMMEND how they can be implemented in various programming paradigms. Some of this content could be in the security considerations sections. That means, for example, the definition of OnEncrypt is to produce a list of new EDKs and new keyring traces, and the job of a MultiKeyring and/or a default CMM is to combine the results from multiple Keyrings. In some cases providing a callback to emit EDKs as in the JS version might be the most convenient, or in a more pure functional language these values will be literally returned from functions. |
I'll keep jumping back on this soapbox as often as I need to , but this is explicitly the complexity that the the Keyring API was designed to avoid. |
In that case, I think what we need is a re-wording of the Keyring solution such that it doesn't depend on mutation to avoid that complexity, if that's possible. I know we're on the same page in terms of supporting immutability, which is great! I'm having trouble figuring out the solution here though, and could use your help. |
Mutation is not the problem. The problem is keeping the bounds of responsibility clear and consistent. A keyring on-encrypt action MUST only ever result in a complete encryption materials grouping. The keyring that causes new components to be added to the logical grouping that is a collection of materials over time MUST be the only thing that needs to understand what those new components are or how they fit in. Individual components MUST NEVER pass outside of a keyring's scope of responsibility on their own. The APIs and scope of responsibility are very clearly defined. Immutable or mutable doesn't change this. |
After some offline conversation, we've agreed on the variation originally spelled out in the issue description. My concern about functional-style keyings accidentally failing to include the input list of EDKs in the result will be addressed by caller-side guards, in objects like the default CMM. Such guards can also verify that values that MUST not change have not been in the result as well. |
Aligns with the intended changes to the specification as per awslabs/aws-encryption-sdk-specification#66. Also fixes the MultiKeyring failure behaviour and adds some explicit tests for it.
The current version of the specification describes the behaviour of some interfaces in terms of what values and structures they take as input, and what portions of that data they update if they are successful.
There will exist implementations in which it is much more natural to return results as independent, freshly-allocated data, instead of mutating inputs. This allows structures to be immutable, which can generally help to avoid many types of bugs. In addition, some programming languages do not naturally support code with mutation and side-effects as naturally as the pure functional programming style.
This issue proposes rephrasing the existing interfaces in a way that describes what values are accepted as input and what values are produced as output. It should NOT invalidate any existing implementations. Instead, there should be a preamble (probably in framework/structures.md) that describes how equivalent implementations are allowed that "produce output" by mutating existing structures instead. If the current description specifies that a value in a structure MUST NOT be modified by an interface, the functional-style version will specify that the value in the returned structure MUST match the corresponding value in the input.
In particular, the PR that addresses this issue will make the following changes to the Keyring interface (which may be hard to perceive accurately in the actual diff):
OnEncrypt: instead of modifying the input Encryption Materials, the interface will return an independent Encryption Materials value. Each field other than the encrypted data keys and the keyring trace must be equal to the corresponding field of the input.
The list of encrypted data keys and the keyring trace specified in the input materials each must be a prefix of the corresponding fields in the output.
OnDecrypt: instead of modifying the input Decryption Materials, the interface will return an independent Decryption Materials value. Each field other than the plaintext data key and the keyring trace must be equal to the corresponding field of the input. The keyring trace specified in the input materials must be a prefix of the trace in the output. The plaintext data key may or may not be populated in this result.
This is closely related with specifying error conditions, as some implementations will indicate different flavours of failure through different return values. Implementations may define one "success" outcome that contains the above output values and multiple "failure" outcomes, either as different return types, or as thrown exceptions, or some other mechanism. The above should be a complete and self-contained idea by itself, though, and I submit that we should commit this change first before improving the specification of failure.
The text was updated successfully, but these errors were encountered: