Skip to content

fix: improve client-side regex statement parser #1328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 3, 2025
Merged

Conversation

olavloite
Copy link
Contributor

The client-side regex-based statement parser contained multiple minor errors, like:

  • BEGIN would match any string as BEGIN TRANSACTION (including stuff like BEGIN foo)
  • COMMIT and ROLLBACK had the same problem as BEGIN.
  • Mismatches were reported as UPDATE. They are now returned as UNKNOWN.
  • DLL missed the ANALYZE keyword

The client-side regex-based statement parser contained multiple minor errors, like:
- BEGIN would match any string as BEGIN TRANSACTION (including stuff like `BEGIN foo`)
- COMMIT and ROLLBACK had the same problem as BEGIN.
- Mismatches were reported as UPDATE. They are now returned as UNKNOWN.
- DLL missed the ANALYZE keyword
@olavloite olavloite requested review from a team as code owners March 25, 2025 15:32
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: spanner Issues related to the googleapis/python-spanner API. labels Mar 25, 2025
@olavloite olavloite requested a review from surbhigarg92 March 26, 2025 12:50
if (
parsed_statement.statement_type != StatementType.UPDATE
and parsed_statement.statement_type != StatementType.INSERT
and parsed_statement.statement_type != StatementType.UNKNOWN
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This provides a fallback if the statement is not recognized by the client-side parser. Instead of failing directly, we allow it to be sent to Spanner and let the Spanner parser determine whether it is valid or not. This prevents potential new DML syntax in the future from being rejected here (e.g. if we were to support merge statements).

RE_BEGIN = re.compile(r"^\s*(BEGIN|START)(TRANSACTION)?", re.IGNORECASE)
RE_COMMIT = re.compile(r"^\s*(COMMIT)(TRANSACTION)?", re.IGNORECASE)
RE_ROLLBACK = re.compile(r"^\s*(ROLLBACK)(TRANSACTION)?", re.IGNORECASE)
RE_BEGIN = re.compile(r"^\s*(BEGIN|START)(\s+TRANSACTION)?\s*$", re.IGNORECASE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of $ at the end of these regular expressions meant that statements like begin whatever string follows does not matter to be accepted as valid statements.

@@ -702,10 +698,6 @@ def set_autocommit_dml_mode(
self._autocommit_dml_mode = autocommit_dml_mode

def _partitioned_query_validation(self, partitioned_query, statement):
if _get_statement_type(Statement(partitioned_query)) is not StatementType.QUERY:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip validation here and let Spanner return an error if the statement is not valid for PartitionQuery.

RE_NON_UPDATE = re.compile(r"^\W*(SELECT|GRAPH|FROM)", re.IGNORECASE)

RE_WITH = re.compile(r"^\s*(WITH)", re.IGNORECASE)

# DDL statements follow
# https://cloud.google.com/spanner/docs/data-definition-language
RE_DDL = re.compile(
r"^\s*(CREATE|ALTER|DROP|GRANT|REVOKE|RENAME)", re.IGNORECASE | re.DOTALL
r"^\s*(CREATE|ALTER|DROP|GRANT|REVOKE|RENAME|ANALYZE)", re.IGNORECASE | re.DOTALL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANALYZE is a valid DDL statement in Spanner.

re.IGNORECASE | re.DOTALL,
)
"""Deprecated: Use the RE_IS_INSERT, RE_IS_UPDATE, and RE_IS_DELETE regexes"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a new deprecation, these regular expressions were already 'deprecated' in the sense that:

  1. They were no longer in use by the library
  2. They are not intended for public use

@olavloite olavloite requested a review from aakashanandg April 2, 2025 13:02
@olavloite olavloite merged commit b3c259d into main Apr 3, 2025
15 checks passed
@olavloite olavloite deleted the cleanup-regex-parser branch April 3, 2025 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants