Skip to content

Commit 8adf246

Browse files
authored
Support quoted column identifiers for scan row_filter (#1863)
# Rationale for this change Our data lake uses old-school Kimball style quoted column names ("User ID", "Customer Name" etc). The string parser for `row_filter` was unable to parse this. Now it is. example: ```python # before >> parser.parse(' "User Name" = 'ted') ParseException: Expected '"', found ' ' # after >> parser.parse(' "User Name" = 'ted') EqualTo("User Name", "ted") # Are these changes tested? Yes a new test was added. ``` >[!NOTE] > The `quoted_column_with_dots` previously errored `with "Expected '"', found '.'"` _when using **double quotes only**_. It now raises error text expecting an `'or'` value; I didn't toil over finding where the exception is clobbered, because the error message between single and double quote exceptions is inconsistent and I didn't really consider this a polished/first-class error message. If this change is an issue, I can dig further to try and revert the wording change; IMO raising the same exception type is more than reasonable to consider the change non-breaking. # Are there any user-facing changes? Yes quoted identifiers are now supported
1 parent da403d2 commit 8adf246

File tree

2 files changed

+16
-3
lines changed

2 files changed

+16
-3
lines changed

pyiceberg/expressions/parser.py

+12-1
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,10 @@
2222
DelimitedList,
2323
Group,
2424
MatchFirst,
25+
ParseException,
2526
ParserElement,
2627
ParseResults,
28+
QuotedString,
2729
Suppress,
2830
Word,
2931
alphanums,
@@ -79,7 +81,16 @@
7981
LIKE = CaselessKeyword("like")
8082

8183
unquoted_identifier = Word(alphas + "_", alphanums + "_$")
82-
quoted_identifier = Suppress('"') + unquoted_identifier + Suppress('"')
84+
quoted_identifier = QuotedString('"', escChar="\\", unquoteResults=True)
85+
86+
87+
@quoted_identifier.set_parse_action
88+
def validate_quoted_identifier(result: ParseResults) -> str:
89+
if "." in result[0]:
90+
raise ParseException("Expected '\"', found '.'")
91+
return result[0]
92+
93+
8394
identifier = MatchFirst([unquoted_identifier, quoted_identifier]).set_results_name("identifier")
8495
column = DelimitedList(identifier, delim=".", combine=False).set_results_name("column")
8596

tests/expressions/test_parser.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -230,9 +230,11 @@ def test_quoted_column_with_dots() -> None:
230230
with pytest.raises(ParseException) as exc_info:
231231
parser.parse("\"foo.bar\".baz = 'data'")
232232

233-
assert "Expected '\"', found '.'" in str(exc_info.value)
234-
235233
with pytest.raises(ParseException) as exc_info:
236234
parser.parse("'foo.bar'.baz = 'data'")
237235

238236
assert "Expected <= | <> | < | >= | > | == | = | !=, found '.'" in str(exc_info.value)
237+
238+
239+
def test_quoted_column_with_spaces() -> None:
240+
assert EqualTo("Foo Bar", "data") == parser.parse("\"Foo Bar\" = 'data'")

0 commit comments

Comments
 (0)