Skip to content

Qubit allocation / deallocation / move #3240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
balopat opened this issue Aug 19, 2020 · 11 comments
Closed

Qubit allocation / deallocation / move #3240

balopat opened this issue Aug 19, 2020 · 11 comments
Labels
roadmap for higher level roadmap items to capture conversations and feedback (not for project tracking)

Comments

@balopat
Copy link
Contributor

balopat commented Aug 19, 2020

Problem: A common optimization for simulators to perform is to allow for qubits to exist for only some small amount of time. Qubits can be allocated and deallocated as the simulation runs and this can result in smaller simulation step sizes (less memory and less time). This type of optimization is useful in quantum error correction and in open systems simulation. In addition there are architectures in which moving a qubit around is a useful concept and one good way to implement this is to consider move swaps operations as a separate type of operation. Simulators can often interpret this as tensor index re-ordering, and hence this can be used to speed up simulation.

Rough requirements:

  • Define how one can allocate and deallocate a qubit.
  • Define movement gates
  • Ensure that the ability to spot allocations / deallocations / movement can be done easily by simulators.
  • Implement optimizations in simulators.
@balopat balopat added the roadmap for higher level roadmap items to capture conversations and feedback (not for project tracking) label Aug 19, 2020
@95-martin-orion
Copy link
Collaborator

There's an argument to be made that #4100 will fulfill the majority of this feature request. In short:

  • Qubits are "allocated" when they first appear in a circuit.
  • Qubits are "deallocated" when they are measured.
  • Simulators handle (de-)allocation by splitting and merging states, as defined in their associated ActOnArgs class.

Qubit-moving is not explicitly covered in that PR, but it does add a "reorder" method to the ActOnArgs interface which could be used for this purpose.

@daxfohl
Copy link
Collaborator

daxfohl commented May 12, 2021

"reorder" would only handle moving if they were in the same qubit entanglement set. If they weren't, of course you could join the two qubit sets then reorder, which would do it but then you've got an unnecessarily joined set.

An optimization would be to add logic to core_iterator that explicitly looks for swap gates, and then renames the corresponding two qubits in the two qubit sets (requires a new rename_qubit function; straightforward to write). Renaming would be more efficient than reordering too because the latter physically restructures the tensor to the specified order.

@daxfohl
Copy link
Collaborator

daxfohl commented Jun 10, 2021

#4169 handles the qubit-moving part.

@95-martin-orion
Copy link
Collaborator

The PRs mentioned above provide implicit qubit (de-)allocation in simulation - i.e., there is no "allocate qubit" operation, the simulators simply detect changes in qubit entanglement and adjust accordingly.

@dabacon, does this fulfill the requirements for ion trap simulation (particularly with regards to "moving" qubits), or are separate, explicit qubit (de-)allocation operations also necessary?

@viathor
Copy link
Collaborator

viathor commented Jun 23, 2021

From @mpharrigan: could we add support for tracing qubits out (rather than measuring them)?

CirqBot pushed a commit that referenced this issue Jul 2, 2021
Add optimization that ensures independent qubit sets are simulated independently. This is done by adding join, extract, and reorder methods to ActOnArgs, and updating SimulatorBase with the logic to merge qubit sets when necessary and split them when possible.

This optimization is enabled or disabled via a new parameter in the simulator constructors: `split_entangled_qubits`. Currently the PR has this set to True by default, though perhaps it should be disabled by default lest it breaks anything? The MPS simulator does not yet have `extract` defined and thus there's no option to enable this feature in MPS simulator's constructor yet, though nothing prevents this from being added later.

The perf boost of this implementation is limited because each StepResult still requires the full product state. It's still a speedup because full product state calculations will only have to occur once per moment rather than once per operation, but not as nice as avoiding full product state calculations entirely. *That* optimization will be available in a subsequent PR that never creates the full product state if possible: StepResults will join the product state only on demand, and sampling will sample each substate independently and zip up the results, avoiding the full state join: The WIP is here https://github.com/daxfohl/Cirq/compare/split...daxfohl:sample?expand=1.

I ramped up the number of qubits in the benchmarks to 25 for sparse and 12 for DM:

From master:
```
(cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/dax/cirq
plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3
collected 5 items

dev_tools/profiling/benchmark_simulators_test.py .....                                                                                                                                                    [100%]

real    0m16.973s
user    0m15.754s
sys     0m3.862s
```

From split branch (the current PR):
```
(cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/dax/cirq
plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3
collected 5 items

dev_tools/profiling/benchmark_simulators_test.py .....                                                                                                                                                    [100%]

real    0m10.073s
user    0m9.082s
sys     0m3.805s
```

From sample branch (future iteration mentioned above):
```
(cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/dax/cirq
plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3
collected 5 items

dev_tools/profiling/benchmark_simulators_test.py .....                                                                                                                                                    [100%]

real    0m2.885s
user    0m3.523s
sys     0m2.597s
```

Initial PR for #3240
Closes #882
@daxfohl
Copy link
Collaborator

daxfohl commented Aug 11, 2021

@mpharrigan @viathor Note all the PRs for this is merged into master now. Per tracing out, if you add reset gates to things you no longer need, the simulator separates those.

So, a bell state between q0, q1, with reset on q1: cirq.Circuit(cirq.H(q0), cirq.CX.on(q0, q1), cirq.reset(q1)) would separate into

q0:
[[0.5, 0]
 [0,   0.5]]
q1:
[[1, 0]
 [0, 0]]

Does this satisfy the needs for tracing out? Do we want an optimizer that appends reset gates to unused/unmeasured qubits? Or can we mark this as done?

@mpharrigan
Copy link
Collaborator

Yes, I think this will work. I had to take a chunk of time to convince myself that resetting a qubit you're throwing out can't mess things up with the qubit you're keeping, and I wanted to find the hidden trace operation. In case it's helpful for others: The unitary equivalent of reset is swapping your system qubit |psi> with a |0> ancilla. The ancilla is traced out to get our familiar reset channel (non-unitary); but psi is now on the ancilla! this is where the trace implicitly happens.
IMG-1028-2

@daxfohl
Copy link
Collaborator

daxfohl commented Aug 17, 2021

@95-martin-orion looks like this can be closed now.

@mpharrigan
Copy link
Collaborator

mpharrigan commented Aug 27, 2021

Just to be clear: the user needs to use DensityMatrixSimulator in order to get the correct behavior for de-allocating qubits. It's strange we let the user shoot themselves in the foot by allowing measure() and reset() calls when doing simulate

@mpharrigan
Copy link
Collaborator

Re-opening. I couldn't get de-allocation to work in any meaningful way, see #4360

@mpharrigan
Copy link
Collaborator

Re-closing. Turns out there were other issues, see #4360

rht pushed a commit to rht/Cirq that referenced this issue May 1, 2023
Add optimization that ensures independent qubit sets are simulated independently. This is done by adding join, extract, and reorder methods to ActOnArgs, and updating SimulatorBase with the logic to merge qubit sets when necessary and split them when possible.

This optimization is enabled or disabled via a new parameter in the simulator constructors: `split_entangled_qubits`. Currently the PR has this set to True by default, though perhaps it should be disabled by default lest it breaks anything? The MPS simulator does not yet have `extract` defined and thus there's no option to enable this feature in MPS simulator's constructor yet, though nothing prevents this from being added later.

The perf boost of this implementation is limited because each StepResult still requires the full product state. It's still a speedup because full product state calculations will only have to occur once per moment rather than once per operation, but not as nice as avoiding full product state calculations entirely. *That* optimization will be available in a subsequent PR that never creates the full product state if possible: StepResults will join the product state only on demand, and sampling will sample each substate independently and zip up the results, avoiding the full state join: The WIP is here https://github.com/daxfohl/Cirq/compare/split...daxfohl:sample?expand=1.

I ramped up the number of qubits in the benchmarks to 25 for sparse and 12 for DM:

From master:
```
(cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/dax/cirq
plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3
collected 5 items

dev_tools/profiling/benchmark_simulators_test.py .....                                                                                                                                                    [100%]

real    0m16.973s
user    0m15.754s
sys     0m3.862s
```

From split branch (the current PR):
```
(cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/dax/cirq
plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3
collected 5 items

dev_tools/profiling/benchmark_simulators_test.py .....                                                                                                                                                    [100%]

real    0m10.073s
user    0m9.082s
sys     0m3.805s
```

From sample branch (future iteration mentioned above):
```
(cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/dax/cirq
plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3
collected 5 items

dev_tools/profiling/benchmark_simulators_test.py .....                                                                                                                                                    [100%]

real    0m2.885s
user    0m3.523s
sys     0m2.597s
```

Initial PR for quantumlib#3240
Closes quantumlib#882
harry-phasecraft pushed a commit to PhaseCraft/Cirq that referenced this issue Oct 31, 2024
Add optimization that ensures independent qubit sets are simulated independently. This is done by adding join, extract, and reorder methods to ActOnArgs, and updating SimulatorBase with the logic to merge qubit sets when necessary and split them when possible.

This optimization is enabled or disabled via a new parameter in the simulator constructors: `split_entangled_qubits`. Currently the PR has this set to True by default, though perhaps it should be disabled by default lest it breaks anything? The MPS simulator does not yet have `extract` defined and thus there's no option to enable this feature in MPS simulator's constructor yet, though nothing prevents this from being added later.

The perf boost of this implementation is limited because each StepResult still requires the full product state. It's still a speedup because full product state calculations will only have to occur once per moment rather than once per operation, but not as nice as avoiding full product state calculations entirely. *That* optimization will be available in a subsequent PR that never creates the full product state if possible: StepResults will join the product state only on demand, and sampling will sample each substate independently and zip up the results, avoiding the full state join: The WIP is here https://github.com/daxfohl/Cirq/compare/split...daxfohl:sample?expand=1.

I ramped up the number of qubits in the benchmarks to 25 for sparse and 12 for DM:

From master:
```
(cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/dax/cirq
plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3
collected 5 items

dev_tools/profiling/benchmark_simulators_test.py .....                                                                                                                                                    [100%]

real    0m16.973s
user    0m15.754s
sys     0m3.862s
```

From split branch (the current PR):
```
(cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/dax/cirq
plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3
collected 5 items

dev_tools/profiling/benchmark_simulators_test.py .....                                                                                                                                                    [100%]

real    0m10.073s
user    0m9.082s
sys     0m3.805s
```

From sample branch (future iteration mentioned above):
```
(cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py
platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/dax/cirq
plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3
collected 5 items

dev_tools/profiling/benchmark_simulators_test.py .....                                                                                                                                                    [100%]

real    0m2.885s
user    0m3.523s
sys     0m2.597s
```

Initial PR for quantumlib#3240
Closes quantumlib#882
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap for higher level roadmap items to capture conversations and feedback (not for project tracking)
Projects
None yet
Development

No branches or pull requests

5 participants