Add support for "flat_sample": sampling without keys for performance #3808

Strilanc · 2021-02-15T14:10:10Z

Currently Cirq has a bit of a performance problem. One of the contributing factors to this is that there are very few "escape hatches" that you can use to opt out of features that slow things down. This suggestion is to add one such escape hatch.

A "flat sample" is a 2d numpy uint8 array where the major index is "repetition index" and the minor axis lists off the output bits ordered by moment, then by within-moment order, then by qubit order within each measurement operation. If we want to go even more extreme, the array could be bit packed to reduce space usage by 8x although then it's much harder for people to use.

To be explicit:

k = 0
for t, moment in enumerate(circuit):
    for op in moment:
        if isinstance(op.gate, cirq.MeasurementGate):
            for q in op.qubits:
                print(f"flat index {k} refers to the measurement of qubit {q} in moment {t}")
                k += 1

Flat measurements can be more performant than non-flat measurements because they don't require scattering values across memory as they are collected. They also implicitly avoid key collision issues to do with measurements being repeated in a loop. This makes them much more amenable to e.g. error correction circuits.

I propose that we modify Sampler to have a flat_sample(circuit) method that by default delegates to Sampler.run, but which we then write more efficient implementations for in cirq.Simulator.

The text was updated successfully, but these errors were encountered:

daxfohl · 2021-02-16T12:57:17Z

Do you have an example of when the cost of key lookup would be significant compared to the cost of doing the simulation itself? It would be valuable to be able to provide before/after benchmarks vs the cost of maintaining the feature.

daxfohl · 2021-02-17T03:22:59Z

Also I wonder if this would be better done using a strategy pattern instead of inheritance. So, you set (or preferably inject) repetition_sampling_strategy instead of requiring a method override. That would allow you to pick and choose which strategy you want rather than having it bound to the simulator class and dealing with inheritance. (There are a number of other things in simulators that I feel are orthogonal concerns and may be better to extract into strategy patterns as well, but that's a different topic).

(Also, granted I'm still new to Python so if there's a more Pythonic way of doing this then that's fine too.)

maffoo · 2021-02-17T16:03:42Z

I like the idea of a packed representation because it's very natural to get from hardware. A different way to support something like this would be to make cirq.Result an abstract type instead of a concrete type. Then samplers would be free to make their own implementations using whatever packed data representation they want, as long as they support "unpacking" to access data for particular measurements by key, per the Result API.

Note, the v1 engine API defined exactly this sort of packed result format: https://github.com/quantumlib/Cirq/blob/master/cirq/google/api/v1/program.proto#L35 (there, the results are actually bit packed, giving the ~8x size reduction mentioned by @Strilanc). We ended up changing this in the v2 api to use a different format with results indexed by key, but TBH I think it'd be worth revisiting this decision for performance reasons, especially if the cirq.Result type itself becomes abstract and we have implementations for packed data.

maffoo · 2021-02-17T16:08:03Z

A somewhat related issue is #3233 which wants to generalize the Result type along other axes such as allowing ints (for qudit measurements) or IQ points (for lower-level hardware readout info) instead of bits. In those cases, as here, I think decoupling the cirq.Result interface from particular concrete implementations would be very helpful.

daxfohl · 2021-02-20T20:06:20Z

Also is this something that would have to be done in the simulator, or is it something that could be done in post processing? If the latter, we can make a function that takes in a measurement map and a circuit, and output a json-like object that has all the nested subcircuit and repetition measurements organized as nested objects and arrays. Then that should be easy analyze efficiently.

Basically, does the perf bottleneck primarily occur during the simulation itself, or in the post analysis? If the latter, then maybe the above could help resolve the issue.

balopat · 2021-02-23T01:35:08Z

+1 to dax's comment - I would like to understand a bit more the use case(s) where the performance issue shows itself, please comment on that @Strilanc - alternatively we can discuss on Cirq Cynque (added the label).

Regarding the implementation: I'm open towards this - I like @maffoo's abstract Result direction + revisiting the program proto + tying it in with the IQ points solution potentially.

daxfohl · 2021-02-23T05:20:33Z

Maybe the simplest solution is to change the classes where you measure and store results from Dict to OrderedDict. Then you can reduce that to raw ordered list of measurements and pack it as much as you want after the fact. This way you don't have to inject functionality into all the simulators or add extra layers of abstraction.

Alternatively, a function that takes a circuit and gives you the ordered list of measurements it produces. You can then create a packed ordered measurement array from that and the result dictionary.

Gut feeling is we probably want to avoid abstracting Result here. I don't see a case for these two representations to be used interchangeably. Trying to force them into an inheritance hierarchy may just cause confusion in the end. Better to keep them as two separate things for now.

daxfohl · 2021-02-23T15:19:29Z

I just saw there's a measurement_keys protocol, but it returns a set instead of a list. If we change that to list, then it looks like it returns everything in order (it worked on all the test cases I tried anyway: subcircuits, repetition ids, key mappings). From which you can do the second option I mentioned above. Would that solve the problem?

Strilanc · 2021-02-23T22:06:01Z

Do you have an example of when the cost of key lookup would be significant compared to the cost of doing the simulation itself?

It's not so much that it's a significant runtime cost as that it is one of the cuts in a death by a thousand cuts.

...Using a bit packed format instead of a bool-per-byte format often gives noticeable speed benefits just due to touching less memory, but I wouldn't expect most data sources and consumers to want to deal with the hassle of packed bits.

I think it's notable that the flat format, and the bit packed format, are much closer to what is actually produced by the hardware. This suggests there may be implementation convenience benefits, and removal of conversion costs.

daxfohl · 2021-02-24T04:46:07Z

If we want this optimization during simulation, we could invert the above option. Have the record_measurement_result function record it in a packed array (which will be only a single code change this once #3841 is merged, which creates an ActOnArgs base class that contains this implementation), and then use the new measurement_keys_in_order protocol mentioned above to unpack the array into a dictionary before returning the run result. And of course provide a run_without_unpacking that returns the raw array.

This approach would give the functionality to all the _act_on_ simulators (sparse, DM, clifford, maybe mps soon). @Strilanc, which simulators are most important here? Are you primarily concerned about these, or qsim, etc?

balopat · 2021-02-24T19:34:23Z

Cirq Cynque:

unanimous agreement around getting this done: abstract Result class + having the ability for different runtime to use whatever native result format and then provide converter methods to it
needs detailed design (a little RFC would be nice!)

@95-martin-orion

Adds a `ClassicalDataStore` class so we can keep track of which qubits are associated to which measurements. Closes #3232. Initially this was created as part 14 (of 14) of https://tinyurl.com/cirq-feedforward to enable qudits in classical conditions, by storing and using dimensions of the measured qubits when calculating the integer value of each measurement when resolving sympy expressions. However it may have broader applicability. This approach also sets us up to more easily add different types of measurements (#3233, #4274). It will also ease the path to #3002 and #4449., as we can eventually pass this into `Result` rather than the raw `log_of_measurement_results` dictionary. (The return type of `_run` will have to be changed to `Sequence[C;assicalDataStoreReader]`. Related: #887, #3231 (open question @95-martin-orion whether this closes those or not) This PR contains a `ClassicalDataStoreReader` and `ClassicalDataStoreBase` parent "interface" for the `ClassicalDataStore` class as well. This will allow us to swap in different representations that may have different performance characteristics. See #3808 for an example use case. This could be done by adding an optional `ClassicalDataStore` factory method argument to the `SimulatorBase` initializer, or separately to sampler classes. (Note this is an alternative to #4778 for supporting qudits in sympy classical control expressions, as discussed here: https://github.com/quantumlib/Cirq/pull/4778/files#r774816995. The other PR was simpler and less invasive, but a bit hacky. I felt even though bigger, this seemed like the better approach and especially fits better with our future direction, and closed the other one). **Breaking Changes**: 1. The abstract method `SimulatorBase._create_partial_act_on_args` argument `log_of_measurement_results: Dict` has been changed to `classical_data: ClassicalData`. Any third-party simulators that inherit `SimulatorBase` will need to update their implementation accordingly. 2. The abstract base class `ActOnArgs.__init__` argument `log_of_measurement_results: Dict` is now copied before use. For users that depend on the pass-by-reference semantics (this should be rare), they can use the new `classical_data: ClassicalData` argument instead, which is pass-by-reference.

@95-martin-orion

…mlib#4781) Adds a `ClassicalDataStore` class so we can keep track of which qubits are associated to which measurements. Closes quantumlib#3232. Initially this was created as part 14 (of 14) of https://tinyurl.com/cirq-feedforward to enable qudits in classical conditions, by storing and using dimensions of the measured qubits when calculating the integer value of each measurement when resolving sympy expressions. However it may have broader applicability. This approach also sets us up to more easily add different types of measurements (quantumlib#3233, quantumlib#4274). It will also ease the path to quantumlib#3002 and quantumlib#4449., as we can eventually pass this into `Result` rather than the raw `log_of_measurement_results` dictionary. (The return type of `_run` will have to be changed to `Sequence[C;assicalDataStoreReader]`. Related: quantumlib#887, quantumlib#3231 (open question @95-martin-orion whether this closes those or not) This PR contains a `ClassicalDataStoreReader` and `ClassicalDataStoreBase` parent "interface" for the `ClassicalDataStore` class as well. This will allow us to swap in different representations that may have different performance characteristics. See quantumlib#3808 for an example use case. This could be done by adding an optional `ClassicalDataStore` factory method argument to the `SimulatorBase` initializer, or separately to sampler classes. (Note this is an alternative to quantumlib#4778 for supporting qudits in sympy classical control expressions, as discussed here: https://github.com/quantumlib/Cirq/pull/4778/files#r774816995. The other PR was simpler and less invasive, but a bit hacky. I felt even though bigger, this seemed like the better approach and especially fits better with our future direction, and closed the other one). **Breaking Changes**: 1. The abstract method `SimulatorBase._create_partial_act_on_args` argument `log_of_measurement_results: Dict` has been changed to `classical_data: ClassicalData`. Any third-party simulators that inherit `SimulatorBase` will need to update their implementation accordingly. 2. The abstract base class `ActOnArgs.__init__` argument `log_of_measurement_results: Dict` is now copied before use. For users that depend on the pass-by-reference semantics (this should be rare), they can use the new `classical_data: ClassicalData` argument instead, which is pass-by-reference.

@95-martin-orion

…mlib#4781) Adds a `ClassicalDataStore` class so we can keep track of which qubits are associated to which measurements. Closes quantumlib#3232. Initially this was created as part 14 (of 14) of https://tinyurl.com/cirq-feedforward to enable qudits in classical conditions, by storing and using dimensions of the measured qubits when calculating the integer value of each measurement when resolving sympy expressions. However it may have broader applicability. This approach also sets us up to more easily add different types of measurements (quantumlib#3233, quantumlib#4274). It will also ease the path to quantumlib#3002 and quantumlib#4449., as we can eventually pass this into `Result` rather than the raw `log_of_measurement_results` dictionary. (The return type of `_run` will have to be changed to `Sequence[C;assicalDataStoreReader]`. Related: quantumlib#887, quantumlib#3231 (open question @95-martin-orion whether this closes those or not) This PR contains a `ClassicalDataStoreReader` and `ClassicalDataStoreBase` parent "interface" for the `ClassicalDataStore` class as well. This will allow us to swap in different representations that may have different performance characteristics. See quantumlib#3808 for an example use case. This could be done by adding an optional `ClassicalDataStore` factory method argument to the `SimulatorBase` initializer, or separately to sampler classes. (Note this is an alternative to quantumlib#4778 for supporting qudits in sympy classical control expressions, as discussed here: https://github.com/quantumlib/Cirq/pull/4778/files#r774816995. The other PR was simpler and less invasive, but a bit hacky. I felt even though bigger, this seemed like the better approach and especially fits better with our future direction, and closed the other one). **Breaking Changes**: 1. The abstract method `SimulatorBase._create_partial_act_on_args` argument `log_of_measurement_results: Dict` has been changed to `classical_data: ClassicalData`. Any third-party simulators that inherit `SimulatorBase` will need to update their implementation accordingly. 2. The abstract base class `ActOnArgs.__init__` argument `log_of_measurement_results: Dict` is now copied before use. For users that depend on the pass-by-reference semantics (this should be rare), they can use the new `classical_data: ClassicalData` argument instead, which is pass-by-reference.

Strilanc added the kind/feature-request Describes new functionality label Feb 15, 2021

balopat added area/measurements area/result labels Feb 23, 2021

smitsanghavi mentioned this issue Feb 27, 2021

Measurement Results Refactor #3868

Closed

daxfohl mentioned this issue Dec 26, 2021

Add ClassicalDataStore class to keep track of qubits measured #4781

Merged

MichaelBroughton added the time/after-1.0 label Mar 28, 2022

95-martin-orion mentioned this issue May 3, 2022

Expand measurement results #3233

Closed

MichaelBroughton removed the time/after-1.0 label Jul 18, 2022

mhucka added this to the 2025-T1: Performance improvements for large circuits milestone Nov 13, 2024

mhucka added this to Cirq 2025 roadmap Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for "flat_sample": sampling without keys for performance #3808

Add support for "flat_sample": sampling without keys for performance #3808

Strilanc commented Feb 15, 2021

daxfohl commented Feb 16, 2021

daxfohl commented Feb 17, 2021 •

edited

Loading

maffoo commented Feb 17, 2021

maffoo commented Feb 17, 2021

daxfohl commented Feb 20, 2021

balopat commented Feb 23, 2021

daxfohl commented Feb 23, 2021 •

edited

Loading

daxfohl commented Feb 23, 2021

Strilanc commented Feb 23, 2021 •

edited

Loading

daxfohl commented Feb 24, 2021

balopat commented Feb 24, 2021 •

edited

Loading

Add support for "flat_sample": sampling without keys for performance #3808

Add support for "flat_sample": sampling without keys for performance #3808

Comments

Strilanc commented Feb 15, 2021

daxfohl commented Feb 16, 2021

daxfohl commented Feb 17, 2021 • edited Loading

maffoo commented Feb 17, 2021

maffoo commented Feb 17, 2021

daxfohl commented Feb 20, 2021

balopat commented Feb 23, 2021

daxfohl commented Feb 23, 2021 • edited Loading

daxfohl commented Feb 23, 2021

Strilanc commented Feb 23, 2021 • edited Loading

daxfohl commented Feb 24, 2021

balopat commented Feb 24, 2021 • edited Loading

daxfohl commented Feb 17, 2021 •

edited

Loading

daxfohl commented Feb 23, 2021 •

edited

Loading

Strilanc commented Feb 23, 2021 •

edited

Loading

balopat commented Feb 24, 2021 •

edited

Loading