-
Notifications
You must be signed in to change notification settings - Fork 270
Allow non-string typed values in table properties #469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow non-string typed values in table properties #469
Conversation
tests/integration/test_writes.py
Outdated
_ = _create_table( | ||
session_catalog, identifier, {"format-version": format_version, **property_with_none}, [arrow_table_with_null] | ||
) | ||
assert "NullPointerException: null value in entry: property_name=null" in str(exc_info.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko this throws a server-side error, not sure if this should be corrected on the server implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we accept null values. Probably we want to catch this in PyIceberg before doing the request. Other backends might have different behavior so we want to make sure that we follow the correct behavior in the client itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good @kevinjqliu, thanks for working on this 👍
pyiceberg/table/metadata.py
Outdated
@field_validator("properties", mode='before') | ||
@classmethod | ||
def transform_dict_value_to_str(cls, dict: Dict[str, Any]) -> Dict[str, str]: | ||
assert None not in dict.values(), "None type is not a supported value in properties" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to avoid asserts outside of tests/
. Could you raise a ValueError
instead?
database_name, _table_name = random_identifier | ||
catalog.create_namespace(database_name) | ||
property_with_none = {"property_name": None} | ||
with pytest.raises(ValidationError) as exc_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect an exception being thrown by the field validator, instead of Pydantic itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Pydantic catches the underlying error and reraise a ValidationError
from the docs,
"""
Pydantic will raise a ValidationError whenever it finds an error in the data it's validating.
Note
Validation code should not raise ValidationError itself, but rather raise a ValueError or AssertionError (or subclass thereof) which will be caught and used to populate ValidationError.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wasn't aware of that. Looks like ValidationError
extends ValueError
, so we're good here. Thanks!
tests/integration/test_writes.py
Outdated
def test_table_properties_raise_for_none_value( | ||
session_catalog: Catalog, | ||
arrow_table_with_null: pa.Table, | ||
format_version: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format_version: str, | |
format_version: int, |
tests/integration/test_writes.py
Outdated
_ = _create_table( | ||
session_catalog, identifier, {"format-version": format_version, **property_with_none}, [arrow_table_with_null] | ||
) | ||
assert "NullPointerException: null value in entry: property_name=null" in str(exc_info.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we accept null values. Probably we want to catch this in PyIceberg before doing the request. Other backends might have different behavior so we want to make sure that we follow the correct behavior in the client itself.
6cc20b9
to
b4ea1b7
Compare
thanks for the review @Fokko, I've address the comments above, ptal |
pyiceberg/types.py
Outdated
for value in dict.values(): | ||
if value is None: | ||
raise ValueError("None type is not a supported value in properties") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help the user to show which key is None
:
for value in dict.values(): | |
if value is None: | |
raise ValueError("None type is not a supported value in properties") | |
for key, value in dict: | |
if value is None: | |
raise ValueError(f"None type is not a supported value in property: {key}") |
@@ -425,7 +426,7 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w | |||
|
|||
|
|||
@pytest.mark.integration | |||
@pytest.mark.parametrize("format_version", ["1", "2"]) | |||
@pytest.mark.parametrize("format_version", [1, 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! One minor comment 👍 Thanks for working on this @kevinjqliu
b4ea1b7
to
220054d
Compare
@Fokko good idea on adding the key! |
Thanks @kevinjqliu for fixing this 👍 |
* property accept int https://stackoverflow.com/questions/77304167/using-pydantic-to-change-int-to-string https://docs.pydantic.dev/latest/concepts/validators/\#field-validators * add tests * add integration tests * pr feedback * make validator reusable * show key when none
* property accept int https://stackoverflow.com/questions/77304167/using-pydantic-to-change-int-to-string https://docs.pydantic.dev/latest/concepts/validators/\#field-validators * add tests * add integration tests * pr feedback * make validator reusable * show key when none
Resolves #376
We want to be able to accept types other than string type in the
properties
field of Table and TableMetadata.For example, setting a value to an
int
typeThis PR adds a "before" field validator to
TableMetadataCommonFields
which will transform the values of theproperties
dict to str. Note, we explicitly disallowNone
type, since the transformation will changeNone
to str"None"
which is unintuitive.References