Skip to content

Support and test type serialization #4693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 23, 2021

Conversation

95-martin-orion
Copy link
Collaborator

Prior to this PR, Cirq supported serialization of instances of Cirq types, but not the types themselves. This PR adds serialization support for Cirq types, with the format:

{
    'cirq_type': 'type',
    'typename': $NAME
}

where $NAME is the cirq_type of the object in its JSON representation. For type T, $NAME is usually T.__name__, but some types (mostly in cirq_google) do not follow this rule. The json_cirq_type protocol and _json_cirq_type_ magic method are provided to handle this.

It is worth noting that this PR explicitly does not support serialization of non-Cirq types (e.g. python builtins, sympy and numpy objects) despite instances of these objects being serializable in Cirq. This support can be added to json_cirq_type and _cirq_object_hook in json_serialization.py if we decide it is necessary; I left it out of this PR as it is not required by the motivating changes behind this PR (#4640 and sub-PRs).

@google-cla google-cla bot added the cla: yes Makes googlebot stop complaining. label Nov 17, 2021
@CirqBot CirqBot added the size: M 50< lines changed <250 label Nov 17, 2021
@MichaelBroughton
Copy link
Collaborator

cc: @mpharrigan

Copy link
Collaborator

@dabacon dabacon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see this was not too hard to add! I'm slightly confused by why the special cases are needed, and it feels strange that this somehow brings back "cirq.google.blah".

@@ -53,6 +53,10 @@ class KeyValueExecutableSpec(ExecutableSpec):
executable_family: str
key_value_pairs: Tuple[Tuple[str, Any], ...] = ()

@classmethod
def _json_cirq_type_(cls) -> str:
return 'cirq.google.KeyValueExecutableSpec'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly confused as to why these are not cirq_google. This seems to be the reason you need these to override the default behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. These are cirq.google.foo because that's what appears in json_resolver_cache.py. Changing to cirq_google would require a backwards-incompatible change to our serialization format.
  2. The default is no module name (e.g. KeyValueExecutableSpec). Even if this was cirq_google, I'd still need the override for these cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up until now, the source of truth for the cirq_type field in the resultant dictionary and/or serialized form has been from the value returned in _json_dict_.

  1. why do we need to query this separate from the value there
  2. now this information is redundant and could become out-of-sync

There's also structure we're missing here: the cirq_type is still the class name, just namespaced. Namespaces are one honking great idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. _json_dict_ is an instance method - I can't call it to retrieve the cirq_type field without instantiating the type, which is what type-serialization is trying to avoid.
  2. True. If you'd like, I could modify the affected classes to use their json_cirq_type method in constructing their _json_dict_.

Namespacing comes back to the backwards-compatibility concern I noted above: all old serialization formats must be deserializable. If we want to globally enforce namespaces in cirq_type values, pre-v1.0 is a good time to do it, but that should be raised as a separate issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "cirq_type" value used to be a completely arbitrary string with the only condition that each object type provide a unique one. It was completely specified by the specific key-value pair in the dictionary returned by _json_dict_ and was only ever consumed by the resolver. By convention, it was the name of the class for top-level cirq objects. This convention was aided by the obj_to_dict_helper and dataclass_json_dict helper functions that could help one implement the _json_dict_ method. There was also a convention I imagined where each objects' "cirq_type" would be namespaced with dots (.) like cirq.google.Thing. That's the convention aided by the namespace argument in obj_to_dict_helper and dataclass_json_dict.

Philosophically, I think this PR represents a change from "by convention it works this way" to "by design it works this way". I think this is ok, but now our protocols (_json_dict_ and _json_cirq_type_) have this redundancy. What do we do if someone defines a class where these conflict? We can modify some or all of the classes under our control to use _json_cirq_type_ to define _json_dict_ but that doesn't stop a future developer or a developer piggy-backing on the cirq json infrastructure (this happens!) from mucking it up.

Finally, re: namespace. I think you misunderstood. I don't care about "cirq.google" vs "cirq_google". I want the json_cirq_type protocol to have support for "this class's json_cirq_type still follow the rules: it's the python class name; i've just prepended some disambiguating string with a dot to the front of it".

What would happen if I did:

    @classmethod
    def _json_cirq_type_(cls) -> str:
        return 'cirq.google.KeyValueExecuatbleSpec'

did you notice the typo? Compare with e.g.

class Thing:
  cirq_type_namespace = 'recirq.qaoa'
  
  def __init__(self, x, y):
     ...

  ...

def default_json_cirq_type:
  if hasattr(obj, 'json_cirq_type'):
    return obj.json_cirq_type()
  
  if hasattr(obj, 'cirq_type_namespace'):
     return obj.cirq_type_namespace + obj.__class__.__name__
  
  return obj.__class__.__name__

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Philosophically, I think this PR represents a change from "by convention it works this way" to "by design it works this way". I think this is ok, but now our protocols (_json_dict_ and _json_cirq_type_) have this redundancy. What do we do if someone defines a class where these conflict?

The round-trip serialization test protects against conflicts in these protocols: both must match the value used in json_resolver_cache, or deserialization will fail. (The typo example above would encounter this test failure.) I see what you mean about redundancy, but it's also present in the suggested "namespaces" design: even with default_json_cirq_type, the Thing._json_dict_ method would need to refer to cirq_type_namespace to be consistent.

What is the main goal here? If we just want to support namespacing, I can do that in this PR. However, if the goal is to remove redundancy, we should lean into the new design and have the cirq_type field (both namespace and class name) be automatically populated by cirq.to_json. That would be a more complicated change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just want to support namespacing, I can do that in this PR. However, if the goal is to remove redundancy, we should lean into the new design and have the cirq_type field (both namespace and class name) be automatically populated by cirq.to_json. That would be a more complicated change.

I agree with this analysis. My philosophical point is that: while this PR looks innocuous enough, the complete implementation of this new protocol would be to lean into the new design and do the more complicated change.

I would appreciate it if you support namespacing in this PR and open a follow-on issue for removing the redundancy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #4698 to track the redundancy-removal step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added namespacing.

@mpharrigan
Copy link
Collaborator

I think I have issues with this. Let me mull

@mpharrigan
Copy link
Collaborator

Objects of type type aren’t special in Python. They’re just objects. Why should they be special here?

We can’t support this the “normal” way because we can’t put a _json_dict_ method onto the definition of the Type class, because this is in the Python standard library.

There are some strong analogies:

  1. We special case support for some sympy, numpy, and pandas objects. Their ‘cirq_type’ is namespaced with sympy, numpy, and pandas.
  2. We don’t provide support for some Python objects. Most notably dictionaries with non-string keys. The escape hatch is for cirq classes encapsulating such an object to marshall to list(d.items()) and back. But there’s a variety of Python things we don’t support serializing: e.g. datetime is one I’ve run into.

We should make this work like (1) or like (2). Would love to hear your thoughts, though.

@mpharrigan
Copy link
Collaborator

I chatted with @95-martin-orion to explain myself. Some conclusions:

(1) might not actually give us the serialization we want for the use case in mind. A full serialization of python objects of type type might be big, have a whole list of methods, or some other can of worms. That seems to imply that (2) might be more natural for this use case

(2) might be more natural, but in some sense benefits enormously from being able to hijack the existing resolver infrastructure.

@95-martin-orion to investigate whether (1) could be feasible (we aren't familiar with what objects of type type are) and whether (2) could be done by leveraging the resolver infrastructure without hijacking it.

@95-martin-orion
Copy link
Collaborator Author

Further investigation following @mpharrigan's comment:

From the Python type object documentation: "each type object stores a large number of values, mostly C function pointers". Given that the contents of a pointer do not persist across (de-)serialization, unless we were to serialize the functions themselves (1) is probably not feasible.

On the other hand, (2) should be possible by using cirq.DEFAULT_RESOLVERS, which is updated by each imported sub-module to include serializable types from that module. However, we would still need the json_cirq_type protocol for types with non-standard cirq_type values.

@mpharrigan
Copy link
Collaborator

Thanks for investigating! I think I agree.

I think we'd need to architect the json_cirq_type protocol more carefully in this case, see my comment.

@95-martin-orion
Copy link
Collaborator Author

I have demoted type-serialization to a pair of methods which can optionally be used in [_from]_json_dict to serialize types.

@mpharrigan, what comment are you referring to regarding json_cirq_type?

@mpharrigan
Copy link
Collaborator

#4693 (comment)

@mpharrigan
Copy link
Collaborator

Did you consider a class attribute instead of a class method? It could look a little cleaner. I wasn't sure exactly how things work in Python, so I did the following test. All give the desired namespace string:

class MyThing:
    namespace = 'cirq.google'

    def __init__(self, x: int):
        self.x = x

    @classmethod
    def _namespace_(cls):
        return 'cirq.google'


print(MyThing.namespace)
print(MyThing(5).namespace)
print(MyThing._namespace_())
print(MyThing(5)._namespace_())

def _json_dict_(self) -> Dict[str, Any]:
return dataclass_json_dict(self, namespace='cirq.google')
return dataclass_json_dict(self, namespace=cirq.json_namespace(type(self)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems rather complicated. Why not namespace=self._json_namespace_())?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling cirq.json_namespace is more friendly to changing _json_namespace_ in a backwards-compatible way, as it limits the number of places that need to change to the _json_namespace_ invocation in the protocol method. (We encountered something similar for _resolve_parameters_ in the past.)

This is primarily a concern outside of Cirq, but since these are the only examples in Cirq I figured I should use this format to encourage its use elsewhere.

@95-martin-orion
Copy link
Collaborator Author

Did you consider a class attribute instead of a class method?

I did, but (a) protocol methods are the more common pattern in Cirq, and (b) the syntax for class attributes differs between "normal" classes and @dataclasses, since @dataclass uses the class attribute syntax for instance attributes. Taking those into consideration, I felt @classmethod was preferable.

Copy link
Collaborator

@mpharrigan mpharrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. Thanks for helping understand your design and thanks for bearing with me

@95-martin-orion 95-martin-orion added the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Nov 23, 2021
@CirqBot CirqBot added the front_of_queue_automerge CirqBot uses this label to indicate (and remember) what's being merged next. label Nov 23, 2021
@CirqBot CirqBot merged commit 42ac9da into quantumlib:master Nov 23, 2021
@CirqBot CirqBot removed automerge Tells CirqBot to sync and merge this PR. (If it's running.) front_of_queue_automerge CirqBot uses this label to indicate (and remember) what's being merged next. labels Nov 23, 2021
@95-martin-orion 95-martin-orion deleted the cirq-cereal-types branch November 23, 2021 15:59
rht pushed a commit to rht/Cirq that referenced this pull request May 1, 2023
Prior to this PR, Cirq supported serialization of _instances_ of Cirq types, but not the types themselves. This PR adds serialization support for Cirq types, with the format:
```
{
    'cirq_type': 'type',
    'typename': $NAME
}
```
where `$NAME` is the `cirq_type` of the object in its JSON representation. For type T, `$NAME` is usually `T.__name__`, but some types (mostly in `cirq_google`) do not follow this rule. The `json_cirq_type` protocol and `_json_cirq_type_` magic method are provided to handle this.

It is worth noting that this PR explicitly **does not** support serialization of non-Cirq types (e.g. python builtins, sympy and numpy objects) despite instances of these objects being serializable in Cirq. This support can be added to `json_cirq_type` and `_cirq_object_hook` in `json_serialization.py` if we decide it is necessary; I left it out of this PR as it is not required by the motivating changes behind this PR (quantumlib#4640 and sub-PRs).
harry-phasecraft pushed a commit to PhaseCraft/Cirq that referenced this pull request Oct 31, 2024
Prior to this PR, Cirq supported serialization of _instances_ of Cirq types, but not the types themselves. This PR adds serialization support for Cirq types, with the format:
```
{
    'cirq_type': 'type',
    'typename': $NAME
}
```
where `$NAME` is the `cirq_type` of the object in its JSON representation. For type T, `$NAME` is usually `T.__name__`, but some types (mostly in `cirq_google`) do not follow this rule. The `json_cirq_type` protocol and `_json_cirq_type_` magic method are provided to handle this.

It is worth noting that this PR explicitly **does not** support serialization of non-Cirq types (e.g. python builtins, sympy and numpy objects) despite instances of these objects being serializable in Cirq. This support can be added to `json_cirq_type` and `_cirq_object_hook` in `json_serialization.py` if we decide it is necessary; I left it out of this PR as it is not required by the motivating changes behind this PR (quantumlib#4640 and sub-PRs).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Makes googlebot stop complaining. size: M 50< lines changed <250
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants