Skip to content

prepare_label_group should handle mismatches of label_name & the zattrs name of the label #619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jluethi opened this issue Nov 30, 2023 · 7 comments

Comments

@jluethi
Copy link
Collaborator

jluethi commented Nov 30, 2023

Issue originally reported by @nrepina

If we copy a label image and give it a new name, the zattrs we get from an old label image also contain a name attribute => potential mismatches.

Example code

# Get the label_attrs correctly
label_attrs = get_zattrs(zarr_url=f"{rx_zarr_path}/labels/{label_name}")

rx_zarr_out = Path(output_path) / component
new_label_name = label_name + '_consensus'

# useful check for overwriting, adds metadata to labels group
_ = prepare_label_group(
    image_group=zarr.group(rx_zarr_out),
    label_name=new_label_name,
    overwrite=True,
    label_attrs=label_attrs,
    logger=logger,
)

Now the label image will be named label_name + '_consensus' in the folder, but the metadata still contains only label_name.

We can debate who's responsibility this should be. The easy fix is to add a line like this:

label_attrs['multiscales'][0]['name'] = new_label_name

But we tend to want to remove more of this metadata handling from tasks and put it into the helper functions.
At a minimum, prepare_label_group should give a warning if there is this mismatch, but maybe even just fix it and give priority to the explicit label_name parameter.

@jluethi
Copy link
Collaborator Author

jluethi commented Nov 30, 2023

Broader discussion: Should the prepare_label_group do any kind of attrs validation to ensure it's a valid OME-Zarr attr? Would also make sense

@jluethi
Copy link
Collaborator Author

jluethi commented Dec 6, 2023

label_attrs need to be valid NGFF attrs => check whether this is tested

@tcompa
Copy link
Collaborator

tcompa commented Dec 11, 2023

As of 9e11b56:

  1. We validate the label_attrs with our NgffImageMeta Pydantic model, and fail if attributes do not comply with specs
  2. If the multiscale name in label_attrs does not match with the label_name function argument, we update the multiscale name.

As of 48a014b:

  1. We make the label_attrs required, for prepare_label_group. @jluethi: we had not explicitly mentioned this change, but it seems natural to me that this function should not write a Zarr group without attributes. I can revert this change if we want the flexibility of preparing a group without attributes that can be modified later from within the task.

@tcompa
Copy link
Collaborator

tcompa commented Dec 11, 2023

Closed within #613. I'm re-opening, to make sure we agree on point 3 above:

We make the label_attrs required, for prepare_label_group. @jluethi: we had not explicitly mentioned this change, but it seems natural to me that this function should not write a Zarr group without attributes. I can revert this change if we want the flexibility of preparing a group without attributes that can be modified later from within the task.

@tcompa tcompa reopened this Dec 11, 2023
@jluethi
Copy link
Collaborator Author

jluethi commented Dec 12, 2023

We make the label_attrs required, for prepare_label_group. @jluethi: we had not explicitly mentioned this change, but it seems natural to me that this function should not write a Zarr group without attributes. I can revert this change if we want the flexibility of preparing a group without attributes that can be modified later from within the task.

Seems fair to me. Is there a reasonable behavior we'd do if a user doesn't provide zattrs? We couldn't know things like pixel sizes etc.

@tcompa
Copy link
Collaborator

tcompa commented Dec 12, 2023

Is there a reasonable behavior we'd do if a user doesn't provide zattrs? We couldn't know things like pixel sizes etc.

This is the current function signature:

def prepare_label_group(
    image_group: zarr.hierarchy.Group,
    label_name: str,
    label_attrs: dict[str, Any],
    overwrite: bool = False,
    logger: Optional[logging.Logger] = None,
) -> zarr.group:

Given these arguments, there's no attribute that we can directly infer. This is not about things like pixel sizes, but really about any attribute.


Things change if we make a further assumption, namely that we have access to the image being labeled (either by assuming we can go up two levels in the Zarr hierarchy and find it, or by having an additional function argument).
In that case, we could get access to the image attributes; if we then also had additional information (mainly the target pyramid level), we could re-build the full attributes of the label image, as is currently done e.g. in the cellpose task (or equivalently in the napari-workflows task):

    new_datasets = rescale_datasets(
        datasets=[ds.dict() for ds in ngff_image_meta.datasets],
        coarsening_xy=coarsening_xy,
        reference_level=level,
        remove_channel_axis=True,
    )

    label_attrs = {
        "image-label": {
            "version": __OME_NGFF_VERSION__,
            "source": {"image": "../../"},
        },
        "multiscales": [
            {
                "name": output_label_name,
                "version": __OME_NGFF_VERSION__,
                "axes": [
                    ax.dict()
                    for ax in ngff_image_meta.multiscale.axes
                    if ax.type != "channel"
                ],
                "datasets": new_datasets,
            }
        ],
    }

This option for sure looks interesting, in view of factoring out a functionality which is already repeated in two tasks (note however that this is not how things happen e.g. in https://github.com/fmi-basel/gliberal-scMultipleX/blob/nar-fractal/src/scmultiplex/fractal/relabel_by_linking_consensus.py#L157-L174).

If we take this latter option, then the prepare_label_group function scope becomes broader, as it will handle both its current feature (mostly overwrite-related checks) and the feature of preparing Zarr attributes for the new group. We could then decide whether this is internally split in two functions, or a single one takes care of all of it (a renaming of the function could then be appropriate).

@jluethi
Copy link
Collaborator Author

jluethi commented Dec 12, 2023

Thanks for the further explanations. I'd say we take this on board for a future refactor of label writing. For now, we made the prepare_label_group function more robust, that should be the limit of the 0.14.0 scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants