Skip to content

Support IsolationLevels and Concurrency Safety Validation Checks #819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sungwy opened this issue Jun 14, 2024 · 9 comments
Open

Support IsolationLevels and Concurrency Safety Validation Checks #819

sungwy opened this issue Jun 14, 2024 · 9 comments
Assignees

Comments

@sungwy
Copy link
Collaborator

sungwy commented Jun 14, 2024

Feature Request / Improvement

Support enforcing Isolation Levels from specified snapshot ID

https://iceberg.apache.org/docs/latest/spark-configuration/#write-options

There's been a lot of continued interest in using multiple PyIceberg applications concurrently and having proper support for optimistic concurrency.

I think the best place to start is through the implementation of the individual validation functions

Once this is complete, we'll be able to introduce the Isolation Levels and correctly implement the validation logic in the _OverwriteFiles snapshot producer, similarly to the Java implementation

@sungwy sungwy self-assigned this Jun 14, 2024
@jqin61
Copy link
Contributor

jqin61 commented Jun 14, 2024

Hi I am interested in working on it!

@sungwy sungwy assigned jqin61 and unassigned sungwy Jun 14, 2024
@sungwy
Copy link
Collaborator Author

sungwy commented Jun 14, 2024

Some relevant links to the Java implementation

@kevinjqliu kevinjqliu removed this from the PyIceberg 0.9.0 release milestone Feb 1, 2025
@sungwy sungwy changed the title Support IsolationLevels Support IsolationLevels and Concurrency Safety Validation Checks Apr 18, 2025
@guptaakashdeep
Copy link
Contributor

Hey @sungwy I would like to contribute by working on these.

Is there any of these that I can pick and starts looking into it like any of the initial validation implementation ?

@sungwy
Copy link
Collaborator Author

sungwy commented Apr 18, 2025

@guptaakashdeep yes, I don't think there's a particular order we should implement these with, so please feel free to assign yourself to the one you find most interesting!

Sung

@guptaakashdeep
Copy link
Contributor

Thanks @sungwy ! Do we have any already existing class where I can implement these Validation functions or should we just add directly in snapshot.py ?

@sungwy
Copy link
Collaborator Author

sungwy commented Apr 18, 2025

I think we could create a new module as pyiceberg.table.update.validate.py and add these validation checks there. What do you think @guptaakashdeep ?

@guptaakashdeep
Copy link
Contributor

Sounds good @sungwy !!

@jayceslesar
Copy link
Contributor

@guptaakashdeep @sungwy see #1935 which should be the building blocks needed to crank out the 4 Sub-issues

@jayceslesar
Copy link
Contributor

jayceslesar commented Apr 19, 2025

Also going to crank out a manifest group implementation today

Edit: @sungwy it looks like the manifestgroup.entries method is extremely similar to the DataScan defined in Table __init__.py file...What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants