Skip to content

Clarify flattening behavior and fix required output shape for inverse indices in unique_* APIs #700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

kgryte
Copy link
Contributor

@kgryte kgryte commented Nov 6, 2023

This PR

  • resolves unique_all doesn't specify that indices should be a flattened index #588 by clarifying flattening behavior and fixing the required output shape for returned inverse indices.
  • makes explicit in the draft specification that flattening should be in row-major, C-style order. This is not backported to prior revisions. While this flattening behavior may have been assumed at the time of writing the specifications, we shouldn't be post-writing this assumption into prior revisions. Instead, we clarify expected behavior going forward. A strict reading of the prior revisions by downstream consumers should entail no guarantees regarding flattening order.
  • makes explicit in the draft specification that the count order should match the order in values (i.e., the returned array of unique values). This was not made explicit in prior revisions, and its omission might indicate that counts could be returned in any order. While unlikely, we prohibit arbitrary count order for subsequent revisions.
  • backports the following changes to the 2021 and 2022 revisions of the Array API standard:
    • correction to the required output shape of inverse_indices. Prior revisions stated that this should have the same shape as x, but this is only true if x is one-dimensional. For multi-dimensional x, the output shape should the shape of a flattened x.
    • clarifications regarding the contents of returned indices (first occurrences). Namely, that the returned array should include indices into a flattened x. For the case where x is one-dimensional, the guidance is the same. For multi-dimensional x, this clarifies that returned indices are for the flattened x and not the original input array.

…ndices`

This commit backports the following changes to the 2021 and 2022
revisions of the Array API standard:

- correction to the required output shape of `inverse_indices`.
Prior revisions stated that this should have the same shape as `x`,
but this is only true if `x` is one-dimensional. For multi-dimensional
`x`, the output shape should the size of a flattened `x`.
- clarifications regarding the contents of returned indices (first
occurrences). Namely, that the returned array should include indices
into a flattened `x`. For the case where `x` is one-dimensional,
the guidance is the same. For multi-dimensional `x`, this clarifies
that returned indices are for the flattened `x` and not the original
input array.
@kgryte kgryte added the Backport Changes involve backporting to previous versions. label Nov 6, 2023
@kgryte kgryte added this to the v2023 milestone Nov 6, 2023
@kgryte
Copy link
Contributor Author

kgryte commented Nov 6, 2023

@asmeurer Would you mind reviewing since you submitted the original issue? Thanks!

@rgommers
Copy link
Member

Eh, now seeing this right after NumPy changed it - consensus in the discussion that flattening was considered a bug, see numpy/numpy#25553. I didn't really follow that PR, so let me ping @jakevdp here for his opinion on what to do with this.

@jakevdp
Copy link

jakevdp commented Jan 11, 2024

The goal of that PR was to make np.unique_all and np.unique_inverse compatible with the 2022.12 array API. After I sent the PR, reviewers suggested we change the output shape in np.unique as well.

For the array API, I think the current specification (i.e. not flattening inverse_indices) makes the most sense, because then you can reconstruct the input using unique_values[inverse_indices], regardless of the input shape.

Copy link
Contributor

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Mark it in red to avoid accidental merge.) So should we just close this PR now that NumPy devs also agreed we don't need flattening?

@leofang
Copy link
Contributor

leofang commented Feb 12, 2024

Q: @kgryte it seems we can just close this PR?

@kgryte
Copy link
Contributor Author

kgryte commented Feb 13, 2024

@leofang Yes, I will close and move the clarifications regarding flattening behavior to a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backport Changes involve backporting to previous versions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unique_all doesn't specify that indices should be a flattened index
4 participants