-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Support and test type serialization #4693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc: @mpharrigan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see this was not too hard to add! I'm slightly confused by why the special cases are needed, and it feels strange that this somehow brings back "cirq.google.blah".
@@ -53,6 +53,10 @@ class KeyValueExecutableSpec(ExecutableSpec): | |||
executable_family: str | |||
key_value_pairs: Tuple[Tuple[str, Any], ...] = () | |||
|
|||
@classmethod | |||
def _json_cirq_type_(cls) -> str: | |||
return 'cirq.google.KeyValueExecutableSpec' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly confused as to why these are not cirq_google. This seems to be the reason you need these to override the default behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
- These are
cirq.google.foo
because that's what appears injson_resolver_cache.py
. Changing to cirq_google would require a backwards-incompatible change to our serialization format. - The default is no module name (e.g.
KeyValueExecutableSpec
). Even if this wascirq_google
, I'd still need the override for these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up until now, the source of truth for the cirq_type field in the resultant dictionary and/or serialized form has been from the value returned in _json_dict_
.
- why do we need to query this separate from the value there
- now this information is redundant and could become out-of-sync
There's also structure we're missing here: the cirq_type is still the class name, just namespaced. Namespaces are one honking great idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_json_dict_
is an instance method - I can't call it to retrieve thecirq_type
field without instantiating the type, which is what type-serialization is trying to avoid.- True. If you'd like, I could modify the affected classes to use their
json_cirq_type
method in constructing their_json_dict_
.
Namespacing comes back to the backwards-compatibility concern I noted above: all old serialization formats must be deserializable. If we want to globally enforce namespaces in cirq_type
values, pre-v1.0 is a good time to do it, but that should be raised as a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "cirq_type" value used to be a completely arbitrary string with the only condition that each object type provide a unique one. It was completely specified by the specific key-value pair in the dictionary returned by _json_dict_
and was only ever consumed by the resolver. By convention, it was the name of the class for top-level cirq objects. This convention was aided by the obj_to_dict_helper
and dataclass_json_dict
helper functions that could help one implement the _json_dict_
method. There was also a convention I imagined where each objects' "cirq_type" would be namespaced with dots (.
) like cirq.google.Thing
. That's the convention aided by the namespace
argument in obj_to_dict_helper
and dataclass_json_dict
.
Philosophically, I think this PR represents a change from "by convention it works this way" to "by design it works this way". I think this is ok, but now our protocols (_json_dict_
and _json_cirq_type_
) have this redundancy. What do we do if someone defines a class where these conflict? We can modify some or all of the classes under our control to use _json_cirq_type_
to define _json_dict_
but that doesn't stop a future developer or a developer piggy-backing on the cirq json infrastructure (this happens!) from mucking it up.
Finally, re: namespace. I think you misunderstood. I don't care about "cirq.google" vs "cirq_google". I want the json_cirq_type protocol to have support for "this class's json_cirq_type still follow the rules: it's the python class name; i've just prepended some disambiguating string with a dot to the front of it".
What would happen if I did:
@classmethod
def _json_cirq_type_(cls) -> str:
return 'cirq.google.KeyValueExecuatbleSpec'
did you notice the typo? Compare with e.g.
class Thing:
cirq_type_namespace = 'recirq.qaoa'
def __init__(self, x, y):
...
...
def default_json_cirq_type:
if hasattr(obj, 'json_cirq_type'):
return obj.json_cirq_type()
if hasattr(obj, 'cirq_type_namespace'):
return obj.cirq_type_namespace + obj.__class__.__name__
return obj.__class__.__name__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Philosophically, I think this PR represents a change from "by convention it works this way" to "by design it works this way". I think this is ok, but now our protocols (
_json_dict_
and_json_cirq_type_
) have this redundancy. What do we do if someone defines a class where these conflict?
The round-trip serialization test protects against conflicts in these protocols: both must match the value used in json_resolver_cache
, or deserialization will fail. (The typo example above would encounter this test failure.) I see what you mean about redundancy, but it's also present in the suggested "namespaces" design: even with default_json_cirq_type
, the Thing._json_dict_
method would need to refer to cirq_type_namespace
to be consistent.
What is the main goal here? If we just want to support namespacing, I can do that in this PR. However, if the goal is to remove redundancy, we should lean into the new design and have the cirq_type
field (both namespace and class name) be automatically populated by cirq.to_json
. That would be a more complicated change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just want to support namespacing, I can do that in this PR. However, if the goal is to remove redundancy, we should lean into the new design and have the cirq_type field (both namespace and class name) be automatically populated by cirq.to_json. That would be a more complicated change.
I agree with this analysis. My philosophical point is that: while this PR looks innocuous enough, the complete implementation of this new protocol would be to lean into the new design and do the more complicated change.
I would appreciate it if you support namespacing in this PR and open a follow-on issue for removing the redundancy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #4698 to track the redundancy-removal step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added namespacing.
I think I have issues with this. Let me mull |
Objects of type type aren’t special in Python. They’re just objects. Why should they be special here? We can’t support this the “normal” way because we can’t put a There are some strong analogies:
We should make this work like (1) or like (2). Would love to hear your thoughts, though. |
I chatted with @95-martin-orion to explain myself. Some conclusions: (1) might not actually give us the serialization we want for the use case in mind. A full serialization of python objects of type type might be big, have a whole list of methods, or some other can of worms. That seems to imply that (2) might be more natural for this use case (2) might be more natural, but in some sense benefits enormously from being able to hijack the existing resolver infrastructure. @95-martin-orion to investigate whether (1) could be feasible (we aren't familiar with what objects of type type are) and whether (2) could be done by leveraging the resolver infrastructure without hijacking it. |
Further investigation following @mpharrigan's comment: From the Python On the other hand, (2) should be possible by using |
Thanks for investigating! I think I agree. I think we'd need to architect the |
I have demoted type-serialization to a pair of methods which can optionally be used in @mpharrigan, what comment are you referring to regarding |
Did you consider a class attribute instead of a class method? It could look a little cleaner. I wasn't sure exactly how things work in Python, so I did the following test. All give the desired namespace string: class MyThing:
namespace = 'cirq.google'
def __init__(self, x: int):
self.x = x
@classmethod
def _namespace_(cls):
return 'cirq.google'
print(MyThing.namespace)
print(MyThing(5).namespace)
print(MyThing._namespace_())
print(MyThing(5)._namespace_()) |
def _json_dict_(self) -> Dict[str, Any]: | ||
return dataclass_json_dict(self, namespace='cirq.google') | ||
return dataclass_json_dict(self, namespace=cirq.json_namespace(type(self))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems rather complicated. Why not namespace=self._json_namespace_())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling cirq.json_namespace
is more friendly to changing _json_namespace_
in a backwards-compatible way, as it limits the number of places that need to change to the _json_namespace_
invocation in the protocol method. (We encountered something similar for _resolve_parameters_
in the past.)
This is primarily a concern outside of Cirq, but since these are the only examples in Cirq I figured I should use this format to encourage its use elsewhere.
I did, but (a) protocol methods are the more common pattern in Cirq, and (b) the syntax for class attributes differs between "normal" classes and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Thanks for helping understand your design and thanks for bearing with me
Prior to this PR, Cirq supported serialization of _instances_ of Cirq types, but not the types themselves. This PR adds serialization support for Cirq types, with the format: ``` { 'cirq_type': 'type', 'typename': $NAME } ``` where `$NAME` is the `cirq_type` of the object in its JSON representation. For type T, `$NAME` is usually `T.__name__`, but some types (mostly in `cirq_google`) do not follow this rule. The `json_cirq_type` protocol and `_json_cirq_type_` magic method are provided to handle this. It is worth noting that this PR explicitly **does not** support serialization of non-Cirq types (e.g. python builtins, sympy and numpy objects) despite instances of these objects being serializable in Cirq. This support can be added to `json_cirq_type` and `_cirq_object_hook` in `json_serialization.py` if we decide it is necessary; I left it out of this PR as it is not required by the motivating changes behind this PR (quantumlib#4640 and sub-PRs).
Prior to this PR, Cirq supported serialization of _instances_ of Cirq types, but not the types themselves. This PR adds serialization support for Cirq types, with the format: ``` { 'cirq_type': 'type', 'typename': $NAME } ``` where `$NAME` is the `cirq_type` of the object in its JSON representation. For type T, `$NAME` is usually `T.__name__`, but some types (mostly in `cirq_google`) do not follow this rule. The `json_cirq_type` protocol and `_json_cirq_type_` magic method are provided to handle this. It is worth noting that this PR explicitly **does not** support serialization of non-Cirq types (e.g. python builtins, sympy and numpy objects) despite instances of these objects being serializable in Cirq. This support can be added to `json_cirq_type` and `_cirq_object_hook` in `json_serialization.py` if we decide it is necessary; I left it out of this PR as it is not required by the motivating changes behind this PR (quantumlib#4640 and sub-PRs).
Prior to this PR, Cirq supported serialization of instances of Cirq types, but not the types themselves. This PR adds serialization support for Cirq types, with the format:
where
$NAME
is thecirq_type
of the object in its JSON representation. For type T,$NAME
is usuallyT.__name__
, but some types (mostly incirq_google
) do not follow this rule. Thejson_cirq_type
protocol and_json_cirq_type_
magic method are provided to handle this.It is worth noting that this PR explicitly does not support serialization of non-Cirq types (e.g. python builtins, sympy and numpy objects) despite instances of these objects being serializable in Cirq. This support can be added to
json_cirq_type
and_cirq_object_hook
injson_serialization.py
if we decide it is necessary; I left it out of this PR as it is not required by the motivating changes behind this PR (#4640 and sub-PRs).