Skip to content

Pauli gate initialization is slower than other gates #6274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zchen088 opened this issue Aug 30, 2023 · 5 comments · Fixed by #6316
Closed

Pauli gate initialization is slower than other gates #6274

zchen088 opened this issue Aug 30, 2023 · 5 comments · Fixed by #6316
Assignees
Labels
area/performance kind/bug-report Something doesn't seem to work. triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on

Comments

@zchen088
Copy link
Collaborator

Description of the issue
Pauli gate initialization is somewhat slower than other gates

How to reproduce the issue

q = cirq.q(0)
%timeit cirq.I(q)
%timeit cirq.X(q)

xhalf = cirq.X**0.5
%timeit xhalf(q)

Here's what I get on my machine:

In [74]: %timeit cirq.I(q)
1.45 µs ± 86.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [75]: %timeit cirq.X(q)
3.47 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [76]: %timeit Xhalf(q)
2.86 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Cirq version
1.3.0.dev20230802160330

@zchen088 zchen088 added the kind/bug-report Something doesn't seem to work. label Aug 30, 2023
@suyashdamle
Copy link
Contributor

I'd like to take this up as my first investigative issue. Thanks!

@ghost
Copy link

ghost commented Oct 2, 2023

Hey @suyashdamle, I'd be interested in collaborating on this with you if possible, also new to contributing haha?

@NoureldinYosri
Copy link
Collaborator

I think this is due to validation during the construction of GateOperation. you can turn off validation to boost performance

In [3]: %timeit cirq.I(q)
1.63 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: cirq.__cirq_debug__.set(False)
Out[4]: <Token var=<ContextVar name='__cirq_debug__' default=True at 0x7f723657d8a0> at 0x7f71e9e25a80>

In [5]: %timeit cirq.I(q)
629 ns ± 8.14 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

a better way would be to do that using contexts e.g.

In [8]: with cirq.with_debug(False):
   ...:     %timeit cirq.I(q)
   ...: 
630 ns ± 4.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

@suyashdamle
Copy link
Contributor

suyashdamle commented Oct 3, 2023

Thanks for the extra context @NoureldinYosri
Will do some analysis to confirm & send a PR if needed at all

@shef4 Thanks! I'm not sure yet about the scope of this issue.. will LYK after investigation if I need extra hands! Thanks!

@tanujkhattar tanujkhattar added triage/needs-more-evidence [Feature requests] Seems plausible, but maintainers are not convinced about the use cases yet triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on and removed triage/needs-more-evidence [Feature requests] Seems plausible, but maintainers are not convinced about the use cases yet labels Oct 13, 2023
@tanujkhattar tanujkhattar self-assigned this Oct 13, 2023
@tanujkhattar
Copy link
Collaborator

@zchen088 Constructing pauli's operations is slower because the Pauli gate class, when applied on qubits, yields SingleQubitPauliGateOperation, which is an operation type that derives from both GateOperation and PauliString. The other gates in your example, when applied on qubits, simply yield a GateOperation. The complicated type hierarchy for Pauli's exists to support the workflow where you can multiply single qubit pauli operations to get back a multi qubit pauli string (i.e. cirq.X(a) * cirq.Y(b) * cirq.Z(c) is a valid 3 qubit operation)

After some investigation, it looks like the relative imports within the Pauli.on() method was hurting performance. I've opened #6316 to fix this. I've used the following code as a test and compared performance before / after my PR

Test Code

q = cirq.q(0)
xhalf = cirq.X**0.5
with cirq.with_debug(False):    
    %timeit cirq.I(q)
    %timeit cirq.X(q)
    %timeit xhalf(q)

Before my PR / on master

770 ns ± 7.84 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
3.35 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
747 ns ± 6.61 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

As you can see, the single qubit pauli operation here is ~4x slower than the other two.

After my PR

803 ns ± 25.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
1.87 µs ± 19.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
752 ns ± 35.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In this case, the single qubit pauli operation now is only ~2x slower. This is as fast as it gets with the current type hierarchy.

If this is still a bottleneck, maybe you can share your exact workflow and we can try to look for potential optimizations without making any major changes to the Cirq's type hierarchy, which would be a pretty big backwards incompatible change.

xref #6097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance kind/bug-report Something doesn't seem to work. triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants