-
Notifications
You must be signed in to change notification settings - Fork 1.1k
SimulatorBase independent qubits optimization #4100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Renamed methods and added optional validation to factorizing in the last two commits. Should be gtg. |
Confirmed that the new changes LGTM. Did you want further review on this for linalg correctness, or should I go ahead with the merge? |
You can merge I think |
Automerge cancelled: A required status check is not present. Missing statuses: ['cla/google'] |
@95-martin-orion I noticed a difference in the density matrix results when doing a reset after a bell state: Is this a problem, or is density matrix representation unaffected by multiplication by a constant? I also tried measure and phase_damp, and both of those produce the same results as the original method. Reset was the only one that did something different. Code below. sim = cirq.DensityMatrixSimulator(split_untangled_states=False)
q0, q1 = cirq.LineQubit.range(2)
circuit = cirq.Circuit(cirq.H(q0), cirq.CX.on(q0, q1), cirq.reset(q1))
result = sim.simulate(circuit)
print()
print(circuit)
print()
print('original calculation:')
print(result.final_density_matrix)
sim = cirq.DensityMatrixSimulator(split_untangled_states=True)
result = sim.simulate(circuit)
print()
print('split states:')
print(result.final_density_matrix) |
Yes, this is a problem: one of the conditions on a valid density matrix is that its trace (sum of the elements on the diagonal) must be equal to 1. Multiplying the matrix by a constant produces an invalid density matrix. It's not immediately clear to me whether the issue here stems from normalization of state components in |
Alright I will figure it out. |
TIL what a partial trace is |
Add optimization that ensures independent qubit sets are simulated independently. This is done by adding join, extract, and reorder methods to ActOnArgs, and updating SimulatorBase with the logic to merge qubit sets when necessary and split them when possible. This optimization is enabled or disabled via a new parameter in the simulator constructors: `split_entangled_qubits`. Currently the PR has this set to True by default, though perhaps it should be disabled by default lest it breaks anything? The MPS simulator does not yet have `extract` defined and thus there's no option to enable this feature in MPS simulator's constructor yet, though nothing prevents this from being added later. The perf boost of this implementation is limited because each StepResult still requires the full product state. It's still a speedup because full product state calculations will only have to occur once per moment rather than once per operation, but not as nice as avoiding full product state calculations entirely. *That* optimization will be available in a subsequent PR that never creates the full product state if possible: StepResults will join the product state only on demand, and sampling will sample each substate independently and zip up the results, avoiding the full state join: The WIP is here https://github.com/daxfohl/Cirq/compare/split...daxfohl:sample?expand=1. I ramped up the number of qubits in the benchmarks to 25 for sparse and 12 for DM: From master: ``` (cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1 benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /home/dax/cirq plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3 collected 5 items dev_tools/profiling/benchmark_simulators_test.py ..... [100%] real 0m16.973s user 0m15.754s sys 0m3.862s ``` From split branch (the current PR): ``` (cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1 benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /home/dax/cirq plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3 collected 5 items dev_tools/profiling/benchmark_simulators_test.py ..... [100%] real 0m10.073s user 0m9.082s sys 0m3.805s ``` From sample branch (future iteration mentioned above): ``` (cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1 benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /home/dax/cirq plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3 collected 5 items dev_tools/profiling/benchmark_simulators_test.py ..... [100%] real 0m2.885s user 0m3.523s sys 0m2.597s ``` Initial PR for quantumlib#3240 Closes quantumlib#882
Add optimization that ensures independent qubit sets are simulated independently. This is done by adding join, extract, and reorder methods to ActOnArgs, and updating SimulatorBase with the logic to merge qubit sets when necessary and split them when possible. This optimization is enabled or disabled via a new parameter in the simulator constructors: `split_entangled_qubits`. Currently the PR has this set to True by default, though perhaps it should be disabled by default lest it breaks anything? The MPS simulator does not yet have `extract` defined and thus there's no option to enable this feature in MPS simulator's constructor yet, though nothing prevents this from being added later. The perf boost of this implementation is limited because each StepResult still requires the full product state. It's still a speedup because full product state calculations will only have to occur once per moment rather than once per operation, but not as nice as avoiding full product state calculations entirely. *That* optimization will be available in a subsequent PR that never creates the full product state if possible: StepResults will join the product state only on demand, and sampling will sample each substate independently and zip up the results, avoiding the full state join: The WIP is here https://github.com/daxfohl/Cirq/compare/split...daxfohl:sample?expand=1. I ramped up the number of qubits in the benchmarks to 25 for sparse and 12 for DM: From master: ``` (cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1 benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /home/dax/cirq plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3 collected 5 items dev_tools/profiling/benchmark_simulators_test.py ..... [100%] real 0m16.973s user 0m15.754s sys 0m3.862s ``` From split branch (the current PR): ``` (cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1 benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /home/dax/cirq plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3 collected 5 items dev_tools/profiling/benchmark_simulators_test.py ..... [100%] real 0m10.073s user 0m9.082s sys 0m3.805s ``` From sample branch (future iteration mentioned above): ``` (cirq-py3) dax@DESKTOP-Q5MLJ3J:~/cirq$ time pytest dev_tools/profiling/benchmark_simulators_test.py platform linux -- Python 3.8.5, pytest-5.4.3, py-1.10.0, pluggy-0.13.1 benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /home/dax/cirq plugins: cov-2.5.1, asyncio-0.12.0, benchmark-3.2.3 collected 5 items dev_tools/profiling/benchmark_simulators_test.py ..... [100%] real 0m2.885s user 0m3.523s sys 0m2.597s ``` Initial PR for quantumlib#3240 Closes quantumlib#882
Add optimization that ensures independent qubit sets are simulated independently. This is done by adding join, extract, and reorder methods to ActOnArgs, and updating SimulatorBase with the logic to merge qubit sets when necessary and split them when possible.
This optimization is enabled or disabled via a new parameter in the simulator constructors:
split_entangled_qubits
. Currently the PR has this set to True by default, though perhaps it should be disabled by default lest it breaks anything? The MPS simulator does not yet haveextract
defined and thus there's no option to enable this feature in MPS simulator's constructor yet, though nothing prevents this from being added later.The perf boost of this implementation is limited because each StepResult still requires the full product state. It's still a speedup because full product state calculations will only have to occur once per moment rather than once per operation, but not as nice as avoiding full product state calculations entirely. That optimization will be available in a subsequent PR that never creates the full product state if possible: StepResults will join the product state only on demand, and sampling will sample each substate independently and zip up the results, avoiding the full state join: The WIP is here https://github.com/daxfohl/Cirq/compare/split...daxfohl:sample?expand=1.
I ramped up the number of qubits in the benchmarks to 25 for sparse and 12 for DM:
From master:
From split branch (the current PR):
From sample branch (future iteration mentioned above):
Initial PR for #3240
Closes #882