Skip to content

Commit 3da3d5a

Browse files
mtreinishjakelishmanElePT
authored
Introduce custom sympy srepr parser [stable/1.4] (#14022)
* Introduce custom sympy srepr parser This commit introduces a custom parser to QPY for parameter expression payloads that were generated using sympy. Prior to QPY version 10 this was the only way we supported serializing parameter expressions in QPY. For QPY version 10, 11, and 12 sympy could optionally be used if the payload was generated explicitly to not use symengine (in qiskit 1.0 it defaulted to use symengine). This serialization format relied on sympy to generate a string representation of the expression which we then put in the payload. On deserialization we called sympy's `parse_expr()` function which internally is calling sympy's `sympify()` internally. Sympy documents that `sympify()` relies on Python's `eval()` for string input and should not be used with untrusted input. But `parse_expr()` didn't have such a warning (at the time, I plan to contribute adding one), so using this function provides an avenue for arbitrary code execution during QPY deserialization. This commit fixes this issue by writing a custom parser for the string repesentation in a QPY payload based on python's ast module. This parser walks the abstract syntax tree and builds the sympy expression object as it it goes. It is restricted to the operations that `ParameterExpression` supports and if any part of the string tries to use functionality outside that set it will error. For `ScheduleBlock` objects which also serialize a symbolic expression there is no new parser to securely load the payload. This is because the data model of `ScheduleBlock` enables using any `sympy` expression and the limited parser is not sufficient to represent the schedule block. Since this functionality is deprecated and will be removed in Qiskit 2.0.0, potentially vulnerable payloads are rejected by default now. If the input is trusted by the user they can explicitly opt-in to using it by setting a new flag to trust the input. This is technically a breaking API change, but since the payload is potentially vulnerable this is a necessary change to secure QPY. * Fix calibrations deserialization The calibrations path was apparently never using symengine encoding for pulse schedules even if it was available and set. This is a bug but fixing it prevents us from loading historical payloads. Since this is pre-existing this restores the previous behavior. We can fix the underlying bug in an independent PR for a 1.4.x release that uses the qiskit version as a gate on how to interpret the symbolic encoding flag. * Suppress lint errors in ast walker * Set trust flag on qpy tests * Apply suggestions from code review Co-authored-by: Jake Lishman <[email protected]> * Check we don't call eval via sympy ever This commit looks for all the calls to sympy that internally call eval() and either does a runtime check to ensure we're not passing user defined strings to those functions or rewrites the functions to not rely on those sympy functions. * Add loud warning to the qpy.load() docs * Make untrusted sympy exception local to sympy usage Previously the exception raised in the pulse path was over eagerly raising for any pulse payload using sympy encoding, whether it contained a symbolic pulse or not. This commit fixes this oversight so that the exception is only raised when we go to use sympy's deserialization. Tests are added to ensure we still raise with a symbolic pulse but not if the schedule block doesn't contain any symbolic expressions. * Apply suggestions from code review Co-authored-by: Elena Peña Tapia <[email protected]> * Fix trailing whitespace * Prepare fix as 1.4.2 release --------- Co-authored-by: Jake Lishman <[email protected]> Co-authored-by: Elena Peña Tapia <[email protected]>
1 parent 1d334fa commit 3da3d5a

File tree

16 files changed

+422
-45
lines changed

16 files changed

+422
-45
lines changed

Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ members = ["crates/*"]
33
resolver = "2"
44

55
[workspace.package]
6-
version = "1.4.1"
6+
version = "1.4.2"
77
edition = "2021"
88
rust-version = "1.70" # Keep in sync with README.md and rust-toolchain.toml.
99
license = "Apache-2.0"

docs/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
# The short X.Y version
3333
version = "1.4"
3434
# The full version, including alpha/beta/rc tags
35-
release = "1.4.1"
35+
release = "1.4.2"
3636

3737
language = "en"
3838

qiskit/VERSION.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.4.1
1+
1.4.2

qiskit/circuit/parameterexpression.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
import numpy
2626
import symengine
2727

28-
from qiskit.circuit.exceptions import CircuitError
28+
from qiskit.circuit.exceptions import CircuitError, QiskitError
2929

3030
# This type is redefined at the bottom to insert the full reference to "ParameterExpression", so it
3131
# can safely be used by runtime type-checkers like Sphinx. Mypy does not need this because it
@@ -528,6 +528,9 @@ def __repr__(self):
528528
def __str__(self):
529529
from sympy import sympify, sstr
530530

531+
if not isinstance(self._symbol_expr, symengine.Basic):
532+
raise QiskitError("Invalid ParameterExpression")
533+
531534
return sstr(sympify(self._symbol_expr), full_prec=False)
532535

533536
def __complex__(self):
@@ -608,6 +611,8 @@ def __eq__(self, other):
608611
return False
609612
from sympy import sympify
610613

614+
if not isinstance(self._symbol_expr, symengine.Basic):
615+
raise QiskitError("Invalid ParameterExpression")
611616
return sympify(self._symbol_expr).equals(sympify(other._symbol_expr))
612617
elif isinstance(other, numbers.Number):
613618
return len(self.parameters) == 0 and complex(self._symbol_expr) == other

qiskit/circuit/tools/pi_check.py

+3
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,10 @@ def pi_check(inpt, eps=1e-9, output="text", ndigits=None):
4949
if isinstance(inpt, ParameterExpression):
5050
param_str = str(inpt)
5151
from sympy import sympify
52+
import symengine
5253

54+
if not isinstance(inpt._symbol_expr, symengine.Basic):
55+
raise QiskitError("Invalid ParameterExpression provided")
5356
expr = sympify(inpt._symbol_expr)
5457
syms = expr.atoms()
5558
for sym in syms:

qiskit/qpy/binary_io/circuits.py

+16-4
Original file line numberDiff line numberDiff line change
@@ -638,7 +638,7 @@ def _read_custom_operations(file_obj, version, vectors):
638638
return custom_operations
639639

640640

641-
def _read_calibrations(file_obj, version, vectors, metadata_deserializer):
641+
def _read_calibrations(file_obj, version, vectors, metadata_deserializer, trust_input=False):
642642
calibrations = {}
643643

644644
header = formats.CALIBRATION._make(
@@ -656,7 +656,12 @@ def _read_calibrations(file_obj, version, vectors, metadata_deserializer):
656656
params = tuple(
657657
value.read_value(file_obj, version, vectors) for _ in range(defheader.num_params)
658658
)
659-
schedule = schedules.read_schedule_block(file_obj, version, metadata_deserializer)
659+
schedule = schedules.read_schedule_block(
660+
file_obj,
661+
version,
662+
metadata_deserializer,
663+
trust_input=trust_input,
664+
)
660665

661666
if name not in calibrations:
662667
calibrations[name] = {(qubits, params): schedule}
@@ -1327,7 +1332,9 @@ def write_circuit(
13271332
_write_layout(file_obj, circuit)
13281333

13291334

1330-
def read_circuit(file_obj, version, metadata_deserializer=None, use_symengine=False):
1335+
def read_circuit(
1336+
file_obj, version, metadata_deserializer=None, use_symengine=False, trust_input=False
1337+
):
13311338
"""Read a single QuantumCircuit object from the file like object.
13321339
13331340
Args:
@@ -1345,6 +1352,7 @@ def read_circuit(file_obj, version, metadata_deserializer=None, use_symengine=Fa
13451352
supported in all platforms. Please check that your target platform is supported by
13461353
the symengine library before setting this option, as it will be required by qpy to
13471354
deserialize the payload.
1355+
trust_input (bool): If true, deserialize vulnerable schedule block payloads.
13481356
Returns:
13491357
QuantumCircuit: The circuit object from the file.
13501358
@@ -1454,7 +1462,11 @@ def read_circuit(file_obj, version, metadata_deserializer=None, use_symengine=Fa
14541462
# Read calibrations
14551463
if version >= 5:
14561464
circ._calibrations_prop = _read_calibrations(
1457-
file_obj, version, vectors, metadata_deserializer
1465+
file_obj,
1466+
version,
1467+
vectors,
1468+
metadata_deserializer,
1469+
trust_input=trust_input,
14581470
)
14591471

14601472
for vec_name, (vector, initialized_params) in vectors.items():
+121
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# This code is part of Qiskit.
2+
#
3+
# (C) Copyright IBM 2025.
4+
#
5+
# This code is licensed under the Apache License, Version 2.0. You may
6+
# obtain a copy of this license in the LICENSE.txt file in the root directory
7+
# of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
8+
#
9+
# Any modifications or derivative works of this code must retain this
10+
# copyright notice, and modified files need to carry a notice indicating
11+
# that they have been altered from the originals.
12+
13+
"""Parser for sympy expressions srepr from ParameterExpression internals."""
14+
15+
import ast
16+
17+
from qiskit.qpy.exceptions import QpyError
18+
19+
20+
ALLOWED_CALLERS = {
21+
"Abs",
22+
"Add",
23+
"Sub",
24+
"Mul",
25+
"Div",
26+
"Pow",
27+
"Symbol",
28+
"Integer",
29+
"Rational",
30+
"Complex",
31+
"Float",
32+
"log",
33+
"sin",
34+
"cos",
35+
"tan",
36+
"atan",
37+
"acos",
38+
"asin",
39+
"exp",
40+
"conjugate",
41+
}
42+
43+
UNARY = {
44+
"sin",
45+
"cos",
46+
"tan",
47+
"atan",
48+
"acos",
49+
"asin",
50+
"conjugate",
51+
"Symbol",
52+
"Integer",
53+
"Complex",
54+
"Abs",
55+
"Float",
56+
}
57+
58+
59+
class ParseSympyWalker(ast.NodeVisitor):
60+
"""A custom ast walker that is passed the sympy srepr from QPY < 13 and creates a custom
61+
expression."""
62+
63+
def __init__(self):
64+
self.stack = []
65+
66+
def visit_UnaryOp(self, node: ast.UnaryOp): # pylint: disable=invalid-name
67+
"""Visit a python unary op node"""
68+
self.visit(node.operand)
69+
arg = self.stack.pop()
70+
if isinstance(node.op, ast.UAdd):
71+
self.stack.append(+arg)
72+
elif isinstance(node.op, ast.USub):
73+
self.stack.append(-arg)
74+
elif isinstance(node.op, ast.Not):
75+
self.stack.append(not arg)
76+
elif isinstance(node.op, ast.Invert):
77+
self.stack.append(~arg)
78+
else:
79+
raise QpyError(f"Invalid unary op as part of sympy srepr: {node.op}")
80+
81+
def visit_Constant(self, node: ast.Constant): # pylint: disable=invalid-name
82+
"""Visit a constant node."""
83+
self.stack.append(node.value)
84+
85+
def visit_Call(self, node: ast.Call): # pylint: disable=invalid-name
86+
"""Visit a call node
87+
88+
This can only be parameter expression allowed sympy call types.
89+
"""
90+
import sympy
91+
92+
if isinstance(node.func, ast.Name):
93+
name = node.func.id
94+
else:
95+
raise QpyError("Unknown node type")
96+
97+
if name not in ALLOWED_CALLERS:
98+
raise QpyError(f"{name} is not part of a valid sympy expression srepr")
99+
100+
args = node.args
101+
if name in UNARY:
102+
if len(args) != 1:
103+
raise QpyError(f"{name} has an invalid number of args in sympy srepr")
104+
self.visit(args[0])
105+
obj = getattr(sympy, name)(self.stack.pop())
106+
self.stack.append(obj)
107+
else:
108+
for arg in args:
109+
self.visit(arg)
110+
out_args = [self.stack.pop() for _ in range(len(args))]
111+
out_args.reverse()
112+
obj = getattr(sympy, name)(*out_args)
113+
self.stack.append(obj)
114+
115+
116+
def parse_sympy_repr(sympy_repr: str):
117+
"""Parse a given sympy srepr into a symbolic expression object."""
118+
tree = ast.parse(sympy_repr, mode="eval")
119+
visitor = ParseSympyWalker()
120+
visitor.visit(tree)
121+
return visitor.stack.pop()

0 commit comments

Comments
 (0)