Skip to content

Allow non-string typed values in table properties #469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kevinjqliu
Copy link
Contributor

@kevinjqliu kevinjqliu commented Feb 24, 2024

Resolves #376

We want to be able to accept types other than string type in the properties field of Table and TableMetadata.
For example, setting a value to an int type

create_table(..., properties={"write.parquet.compression-level": 42})

This PR adds a "before" field validator to TableMetadataCommonFields which will transform the values of the properties dict to str. Note, we explicitly disallow None type, since the transformation will change None to str "None" which is unintuitive.

References

@kevinjqliu kevinjqliu changed the title Kevinjqliu/property value coerce to string Allow non-string typed values in table properties Feb 24, 2024
_ = _create_table(
session_catalog, identifier, {"format-version": format_version, **property_with_none}, [arrow_table_with_null]
)
assert "NullPointerException: null value in entry: property_name=null" in str(exc_info.value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko this throws a server-side error, not sure if this should be corrected on the server implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we accept null values. Probably we want to catch this in PyIceberg before doing the request. Other backends might have different behavior so we want to make sure that we follow the correct behavior in the client itself.

@kevinjqliu kevinjqliu marked this pull request as ready for review February 24, 2024 20:05
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @kevinjqliu, thanks for working on this 👍

@field_validator("properties", mode='before')
@classmethod
def transform_dict_value_to_str(cls, dict: Dict[str, Any]) -> Dict[str, str]:
assert None not in dict.values(), "None type is not a supported value in properties"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid asserts outside of tests/. Could you raise a ValueError instead?

database_name, _table_name = random_identifier
catalog.create_namespace(database_name)
property_with_none = {"property_name": None}
with pytest.raises(ValidationError) as exc_info:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect an exception being thrown by the field validator, instead of Pydantic itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Pydantic catches the underlying error and reraise a ValidationError

from the docs,
"""
Pydantic will raise a ValidationError whenever it finds an error in the data it's validating.

Note
Validation code should not raise ValidationError itself, but rather raise a ValueError or AssertionError (or subclass thereof) which will be caught and used to populate ValidationError.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't aware of that. Looks like ValidationError extends ValueError, so we're good here. Thanks!

def test_table_properties_raise_for_none_value(
session_catalog: Catalog,
arrow_table_with_null: pa.Table,
format_version: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
format_version: str,
format_version: int,

_ = _create_table(
session_catalog, identifier, {"format-version": format_version, **property_with_none}, [arrow_table_with_null]
)
assert "NullPointerException: null value in entry: property_name=null" in str(exc_info.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we accept null values. Probably we want to catch this in PyIceberg before doing the request. Other backends might have different behavior so we want to make sure that we follow the correct behavior in the client itself.

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/property-value-coerce-to-string branch from 6cc20b9 to b4ea1b7 Compare February 27, 2024 07:07
@kevinjqliu
Copy link
Contributor Author

thanks for the review @Fokko, I've address the comments above, ptal

Comment on lines 67 to 69
for value in dict.values():
if value is None:
raise ValueError("None type is not a supported value in properties")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help the user to show which key is None:

Suggested change
for value in dict.values():
if value is None:
raise ValueError("None type is not a supported value in properties")
for key, value in dict:
if value is None:
raise ValueError(f"None type is not a supported value in property: {key}")

@@ -425,7 +426,7 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w


@pytest.mark.integration
@pytest.mark.parametrize("format_version", ["1", "2"])
@pytest.mark.parametrize("format_version", [1, 2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! One minor comment 👍 Thanks for working on this @kevinjqliu

@kevinjqliu kevinjqliu force-pushed the kevinjqliu/property-value-coerce-to-string branch from b4ea1b7 to 220054d Compare February 29, 2024 15:53
@kevinjqliu
Copy link
Contributor Author

@Fokko good idea on adding the key!

@Fokko Fokko merged commit d56dddd into apache:main Feb 29, 2024
@Fokko
Copy link
Contributor

Fokko commented Feb 29, 2024

Thanks @kevinjqliu for fixing this 👍

@kevinjqliu kevinjqliu deleted the kevinjqliu/property-value-coerce-to-string branch February 29, 2024 21:20
himadripal pushed a commit to himadripal/iceberg-python that referenced this pull request Mar 1, 2024
himadripal pushed a commit to himadripal/iceberg-python that referenced this pull request Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow non-stringly typed table properties
2 participants