Pauli gate initialization is slower than other gates #6274

zchen088 · 2023-08-30T18:27:17Z

Description of the issue
Pauli gate initialization is somewhat slower than other gates

How to reproduce the issue

q = cirq.q(0)
%timeit cirq.I(q)
%timeit cirq.X(q)

xhalf = cirq.X**0.5
%timeit xhalf(q)

Here's what I get on my machine:

In [74]: %timeit cirq.I(q)
1.45 µs ± 86.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [75]: %timeit cirq.X(q)
3.47 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [76]: %timeit Xhalf(q)
2.86 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Cirq version
1.3.0.dev20230802160330

The text was updated successfully, but these errors were encountered:

suyashdamle · 2023-09-14T06:00:14Z

I'd like to take this up as my first investigative issue. Thanks!

ghost · 2023-10-02T01:02:31Z

Hey @suyashdamle, I'd be interested in collaborating on this with you if possible, also new to contributing haha?

NoureldinYosri · 2023-10-02T18:29:11Z

I think this is due to validation during the construction of GateOperation. you can turn off validation to boost performance

In [3]: %timeit cirq.I(q)
1.63 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: cirq.__cirq_debug__.set(False)
Out[4]: <Token var=<ContextVar name='__cirq_debug__' default=True at 0x7f723657d8a0> at 0x7f71e9e25a80>

In [5]: %timeit cirq.I(q)
629 ns ± 8.14 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

a better way would be to do that using contexts e.g.

In [8]: with cirq.with_debug(False):
   ...:     %timeit cirq.I(q)
   ...: 
630 ns ± 4.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

suyashdamle · 2023-10-03T03:29:31Z

Thanks for the extra context @NoureldinYosri
Will do some analysis to confirm & send a PR if needed at all

@shef4 Thanks! I'm not sure yet about the scope of this issue.. will LYK after investigation if I need extra hands! Thanks!

tanujkhattar · 2023-10-13T19:54:15Z

@zchen088 Constructing pauli's operations is slower because the Pauli gate class, when applied on qubits, yields SingleQubitPauliGateOperation, which is an operation type that derives from both GateOperation and PauliString. The other gates in your example, when applied on qubits, simply yield a GateOperation. The complicated type hierarchy for Pauli's exists to support the workflow where you can multiply single qubit pauli operations to get back a multi qubit pauli string (i.e. cirq.X(a) * cirq.Y(b) * cirq.Z(c) is a valid 3 qubit operation)

After some investigation, it looks like the relative imports within the Pauli.on() method was hurting performance. I've opened #6316 to fix this. I've used the following code as a test and compared performance before / after my PR

Test Code

q = cirq.q(0)
xhalf = cirq.X**0.5
with cirq.with_debug(False):    
    %timeit cirq.I(q)
    %timeit cirq.X(q)
    %timeit xhalf(q)

Before my PR / on master

770 ns ± 7.84 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
3.35 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
747 ns ± 6.61 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

As you can see, the single qubit pauli operation here is ~4x slower than the other two.

After my PR

803 ns ± 25.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
1.87 µs ± 19.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
752 ns ± 35.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In this case, the single qubit pauli operation now is only ~2x slower. This is as fast as it gets with the current type hierarchy.

If this is still a bottleneck, maybe you can share your exact workflow and we can try to look for potential optimizations without making any major changes to the Cirq's type hierarchy, which would be a pretty big backwards incompatible change.

xref #6097

zchen088 added the kind/bug-report Something doesn't seem to work. label Aug 30, 2023

tanujkhattar self-assigned this Oct 13, 2023

tanujkhattar added the area/performance label Oct 13, 2023

tanujkhattar mentioned this issue Oct 13, 2023

Speed up construction of single qubit pauli operations - cirq.X(q) #6316

Merged

maffoo closed this as completed in #6316 Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pauli gate initialization is slower than other gates #6274

Pauli gate initialization is slower than other gates #6274

zchen088 commented Aug 30, 2023

suyashdamle commented Sep 14, 2023

ghost commented Oct 2, 2023 •

edited by ghost

Loading

NoureldinYosri commented Oct 2, 2023

suyashdamle commented Oct 3, 2023 •

edited

Loading

tanujkhattar commented Oct 13, 2023

Pauli gate initialization is slower than other gates #6274

Pauli gate initialization is slower than other gates #6274

Comments

zchen088 commented Aug 30, 2023

suyashdamle commented Sep 14, 2023

ghost commented Oct 2, 2023 • edited by ghost Loading

NoureldinYosri commented Oct 2, 2023

suyashdamle commented Oct 3, 2023 • edited Loading

tanujkhattar commented Oct 13, 2023

Test Code

Before my PR / on master

After my PR

ghost commented Oct 2, 2023 •

edited by ghost

Loading

suyashdamle commented Oct 3, 2023 •

edited

Loading