Skip to content

Support writing to a table with sort-order #271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Fokko opened this issue Jan 16, 2024 · 7 comments · May be fixed by #871
Open

Support writing to a table with sort-order #271

Fokko opened this issue Jan 16, 2024 · 7 comments · May be fixed by #871
Assignees

Comments

@Fokko
Copy link
Contributor

Fokko commented Jan 16, 2024

Feature Request / Improvement

We fail when we see a sort order, it would be great if we could sort+write the data based on the sort-order.

@vinjai
Copy link

vinjai commented Jun 6, 2024

Hi @Fokko
I would like to give a shot at this if no one has already taken it.

@Fokko
Copy link
Contributor Author

Fokko commented Jun 7, 2024

@vinjai Great, I've assigned it to you!

@vinjai vinjai linked a pull request Jun 29, 2024 that will close this issue
@vinjai
Copy link

vinjai commented Jul 1, 2024

Hey @Fokko

Question around transformation defined in SortOrder:
Our input for sorting is a pyarrow table in the append and override methods. Here we have two options:

  1. Sort the pyarrow table using the pyarrow transform. A lot of transforms have not been implemented or supported in pyarrow. This introduces two scenarios:
    • Breaking Change: If the transformation is not supported, we don't write ahead raising an appropriate error.
    • Silently moving ahead: Add unsorted order id and ignore the sort functionality altogether
  2. Convert the pyarrow table to a python object and then sort within python.

I am more in favor of the first one.

@vinjai
Copy link

vinjai commented Jul 2, 2024

For instance (in BucketTransform):

def pyarrow_transform(self, source: IcebergType) -> "Callable[[pa.Array], pa.Array]":
raise NotImplementedError()

@Fokko
Copy link
Contributor Author

Fokko commented Jul 3, 2024

@vinjai Since we ignore the write-order today, I think proceeding is fine. Maybe raise a warning so the user knows the data isn't being sorted. Sorting in Python would be very costly.

@Fokko
Copy link
Contributor Author

Fokko commented Oct 30, 2024

Let's pass this to the next release when we have all the transforms implemented using the Rust extension. cc @sungwy

@vinjai
Copy link

vinjai commented Nov 7, 2024

Hey @Fokko

I'm not completely sure about the details of the Rust Extension implementation.
Could you share any documents/proposals/issues that explain what we're aiming to achieve?
I’d love to help out in any way I can to move things forward 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants