Skip to content

[napari-workflows] Support 2D processing #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tcompa opened this issue Oct 21, 2022 · 13 comments · Fixed by #166
Closed

[napari-workflows] Support 2D processing #149

tcompa opened this issue Oct 21, 2022 · 13 comments · Fixed by #166

Comments

@tcompa
Copy link
Collaborator

tcompa commented Oct 21, 2022

In measurement, we had implemented some (preliminary?) handling of 2D data. Should this also be generally available in napari_workflows_wrapper?

@jluethi
Copy link
Collaborator

jluethi commented Oct 22, 2022

@tcompa What would this mean? In general: Our data may be 2 or 3 dimension and we should be able to run workflows on either. What extra handling was required for measurements for the 2D case?

@tcompa
Copy link
Collaborator Author

tcompa commented Oct 24, 2022

Sorry, I made it a bit too brief..

The current measurement task (which should be deprecated, once we are happy with napari_workflows_wrapper) includes this kind of logic:

    # Check whether data are 2D or 3D, and squeeze arrays if needed
    is_2D = img.shape[0] == 1
    if is_2D:
        img = img[0, :, :]
        label_img_up = label_img_up[0, :, :]

and then

    for i_ROI, indices in enumerate(list_indices):
        s_z, e_z, s_y, e_y, s_x, e_x = indices[:]
        ROI = (slice(s_z, e_z), slice(s_y, e_y), slice(s_x, e_x))
        if is_2D:
            if not (s_z, e_z) == (0, 1):
                raise Exception("Something went wrong with 2D ROI ", ROI)
            ROI = (slice(s_y, e_y), slice(s_x, e_x))

This is to enable measurement on MIP images/labels, since we always store data as a 3D array (with a dummy Z dimension) but napari_workflows wants actual 2D arrays.

I'm just checking, but my reasonable guess is that:

  1. We should have the same logic also in the new wrapper.
  2. We should enforce dimensionality to be fixed, for each time the wrapper is called. This means that we only allow all-3D inputs or all-2D inputs, and not mixed-dimensions inputs.

@jluethi
Copy link
Collaborator

jluethi commented Oct 24, 2022

Yes, we should have the same logic in the wrapper, that makes sense :)

  1. We should enforce dimensionality to be fixed, for each time the wrapper is called. This means that we only allow all-3D inputs or all-2D inputs, and not mixed-dimensions inputs.

Hmm... Generally agree for most workflows. But there are some workflows that could take in a 3D image and return a 2D label map. And for flexibility, it would be good to accept pipelines that take in e.g. the MIP image, but can produce the segmentation for the whole 3D image (not that should be a workflow used often, but maybe someone needs their 2D segmentation saved in 3D for down-the-road processing?).

Both those scenarios should be somewhat of an exception. We can default to not doing this behavior, but allow users to do it somehow?

Also, while we are not tackling time-data yet, maybe we should start thinking about this topic for such design decisions. Eventually, we will also process time-resolved data, so data may be 2D, 3D, 4D or e.g. 2D + time (=> 3 actual dimensions, but maybe saved as 4D with Z dimension = 1)

@tcompa
Copy link
Collaborator Author

tcompa commented Oct 25, 2022

Also, while we are not tackling time-data yet, maybe we should start thinking about this topic for such design decisions. Eventually, we will also process time-resolved data, so data may be 2D, 3D, 4D or e.g. 2D + time (=> 3 actual dimensions, but maybe saved as 4D with Z dimension = 1)

Let's keep this issue about 2D napari_workflows, and move the broader discussion e.g. to #150

@tcompa
Copy link
Collaborator Author

tcompa commented Oct 25, 2022

  1. We should enforce dimensionality to be fixed, for each time the wrapper is called. This means that we only allow all-3D inputs or all-2D inputs, and not mixed-dimensions inputs.

Hmm... Generally agree for most workflows. But there are some workflows that could take in a 3D image and return a 2D label map. And for flexibility, it would be good to accept pipelines that take in e.g. the MIP image, but can produce the segmentation for the whole 3D image (not that should be a workflow used often, but maybe someone needs their 2D segmentation saved in 3D for down-the-road processing?).

Both those scenarios should be somewhat of an exception. We can default to not doing this behavior, but allow users to do it somehow?

The first possible solution (not necessarily the best) would to include this information in the input/output specs for this wrapper task. The behavior would be this one:

  • For each input, check if it is expected to be 2D or 3D. This is the dimensionality which is required by the numpy/scikit functions used in the napari-workflow.
    • If 3D, load the appropriate image/label array (and verify that it is actually 3D).
    • If 2D, load the appropriate image/label array, check its shape, and remove dummy dimensions if needed. In this case, also the ROIs should be specified to be YX-only (as in the snippet above, in my previous comment).
  • For each label output (at the moment we do not allow image outputs, and dataframes are not really concerned by this issue), verify that the napari-workflow-output shape matches with the expected one. If the output is 2D but expected to be 3D, insert dummy Z dimension before storing to disk.

How does that sound? To me it looks reasonable, but I'm not yet sure it covers all use cases.

@jluethi
Copy link
Collaborator

jluethi commented Oct 27, 2022

For each input, check if it is expected to be 2D or 3D.

I conceptually like this. But is there a way to know whether a function will want a 2D or 3D image? Most numpy/scikit image functions can handle both after all... If it's reasonably possible to do this check, awesome, let's go with that!
Alternatively, we could have a "force_2d" flag or something like this that could be set? But that could become a bit messy.

For each label output, verify that the napari-workflow-output shape matches with the expected one

That would be a good check to make. Given that we store everything as 3D (with a Z dim of 1 if it's 2D data only, right?), it would just mean potentially adding the dummy Z dimension, right?

at the moment we do not allow image outputs

Good point actually. I implicitly decided that this should be out-of-scope for the moment. Theoretically, I can see a use for it and we may want to tackle this eventually. e.g. one could do illumination correction via napari workflows in theory. But I'm hesitant about this, it could get quite messy either adding many new image channels through workflows. Thus, at the moment, we only allow label outputs (they are compressed very well and clearly separate from the channels) and dataframe outputs.

@tcompa
Copy link
Collaborator Author

tcompa commented Oct 27, 2022

I tried to virtually explore all branches for I/O dimensionality handling, based on this discussion, and I am putting my most up-to-date proposed solution below. Comments are always welcome, and I'll refrain from any implementation until we reach some roughly precise definition.

Image outputs

This one is easy: they are currently out of scope.

Image/label inputs

For each image/label input, we may add an expected_dimensions key to each input_specs entry, taking values 3 or 2. The input handling then would be like this:

  • If expected_dimensions=3, loading takes place as usual. We may want to add a warning like "Warning: array is 3D but first dimension has size=1. Was it meant to be 2D?" (if input.shape[0]==1). But that would not be strictly necessary.
  • If expected_dimensions=2, loading takes place as usual, then we add a hard constraint that input.shape[0]==1. If True, remove the first dimension. If False, the tasks fails.
  • During the loop over ROIs, if expected_dimension of a certain input is equal to 2, then the 3D ROI is transformed into a 2D ROI, after checking that it has a dummy Z size (and failing otherwise).

In this way, for instance, we could prepare a workflow that also measures perimeter (for MIP images/labels), and make sure that we only run it on 2D inputs.

Most numpy/scikit image functions can handle both after all..

Not all napari-workflows are fully general in the input dimensionality, and then I guess that if we don't set this expected_dimensions parameter we will still have to (sometimes) reduce the dimensionality of an array to make it 2D. The obvious example coming to mind is the measurement of perimeters, which cannot take place if you do not reduce the dimensions to 2D. If we consider this example to be part of a small set of not-so-relevant examples, we can also ignore it.

Note 1

The proposed solution is equivalent to my expectation for @jluethi's force_2D one. We can choose either one and obtain the same exact functionality (e.g. force_2D defaults to False, and if it is True then we go back to my list of 2D/3D cases).

Note 2

The discussion seems to lead to a choice of not enforcing any additional constraint like "all inputs/outputs should have the same dimensionality". We would then allow mixed-dimensionality workflows. If some inputs are combined within a workflow (e.g. when doing the difference between two image arrays) and they have a shape-mismatch which cannot be handled, then it's up to the napari-workflow/scikit-image/numpy functions to raise an error.
I am fine with this choice, if that was the intended one.

Label outputs

Also for label-typed outputs we may add the expected_dimensions (or an equivalent force_2D one) key to each output_specs entry. At the end of the napari-workflow, we check that each output has the right dimension:

  • If output array is 3D and expected_dimension=3, save it as usual.
  • If output array is 2D and expected_dimensions=2, then everything is under control. We add a new dummy (z) dimension, and save it as a 3D array.
  • If output dimensions differ from expected_dimension, the tasks fails. Or, possibly, we just fall-back onto the most reasonable choice, by ignoring the expected_dimensions parameter and by raising a warning (but in that case we may as well just handle everything automatically, see note below).

Note

In the case of outputs, we could also decide that this handling is done automatically (without any expected_dimensions parameter). If you receive 3D output, you save it as it is. If you receive 2D output, you add a dimension and then save it.
Adding this parameter would only make things more transparent to the user, but if we think it's just making things more complex we can skip it.

@jluethi
Copy link
Collaborator

jluethi commented Oct 27, 2022

Thanks for the great overview! Adding additional parameters makes things more complex, but reading this, I'd say I'm fine with adding the expected_dimensions parameter (also better name than my force_2D). Can this parameter be optional? Probably best to have expected_dimensions=3 as a default (that never fails, but sometimes isn't what the user wants) and add the warning about the shape when the Z dimension size is 1.

Not all napari-workflows are fully general in the input dimensionality, and then I guess that if we don't set this expected_dimensions parameter we will still have to (sometimes) reduce the dimensionality of an array to make it 2D. The obvious example coming to mind is the measurement of perimeters, which cannot take place if you do not reduce the dimensions to 2D. If we consider this example to be part of a small set of not-so-relevant examples, we can also ignore it.

I don't think we can know which workflows would require which dimensionality. The user could basically put any dask graph chaining relevant function calls together into a napari workflow. I think for the time being, we assume the user created the workflow on the same data modality that they load in.

Note 2: The discussion seems to lead to a choice of not enforcing any additional constraint like "all inputs/outputs should have the same dimensionality".

I think it will mostly be used as a global parameter like level. Not forcing it to be global does give potential extra flexibility though, e.g. loading a 2D label mask and then 3D images. We currently couldn't really do this (because we have the MIPs in a separate OME-Zarr file), but it would make sense to have them in the same OME-Zarr file eventually (though I have not yet figured out what the right way of combining them would be).
If we allow for this potential flexibility, then we trust the user to set this setting correctly.

In the case of outputs, we could also decide that this handling is done automatically (without any expected_dimensions parameter). If you receive 3D output, you save it as it is. If you receive 2D output, you add a dimension and then save it.
Adding this parameter would only make things more transparent to the user, but if we think it's just making things more complex we can skip it.

I think we can skip this, the user shouldn't be concerned with that part, because the only thing the choice can do is trigger an error when the wrong choice is made.

@tcompa
Copy link
Collaborator Author

tcompa commented Oct 28, 2022

Thanks for the great overview! Adding additional parameters makes things more complex, but reading this, I'd say I'm fine with adding the expected_dimensions parameter (also better name than my force_2D). Can this parameter be optional? Probably best to have expected_dimensions=3 as a default (that never fails, but sometimes isn't what the user wants) and add the warning about the shape when the Z dimension size is 1.

Not all napari-workflows are fully general in the input dimensionality, and then I guess that if we don't set this expected_dimensions parameter we will still have to (sometimes) reduce the dimensionality of an array to make it 2D. The obvious example coming to mind is the measurement of perimeters, which cannot take place if you do not reduce the dimensions to 2D. If we consider this example to be part of a small set of not-so-relevant examples, we can also ignore it.

I don't think we can know which workflows would require which dimensionality. The user could basically put any dask graph chaining relevant function calls together into a napari workflow. I think for the time being, we assume the user created the workflow on the same data modality that they load in.

Note 2: The discussion seems to lead to a choice of not enforcing any additional constraint like "all inputs/outputs should have the same dimensionality".

I think it will mostly be used as a global parameter like level. Not forcing it to be global does give potential extra flexibility though, e.g. loading a 2D label mask and then 3D images. We currently couldn't really do this (because we have the MIPs in a separate OME-Zarr file), but it would make sense to have them in the same OME-Zarr file eventually (though I have not yet figured out what the right way of combining them would be). If we allow for this potential flexibility, then we trust the user to set this setting correctly.

In the case of outputs, we could also decide that this handling is done automatically (without any expected_dimensions parameter). If you receive 3D output, you save it as it is. If you receive 2D output, you add a dimension and then save it.
Adding this parameter would only make things more transparent to the user, but if we think it's just making things more complex we can skip it.

I think we can skip this, the user shouldn't be concerned with that part, because the only thing the choice can do is trigger an error when the wrong choice is made.

All right, then I will proceed as discussed.

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 7, 2022

I propose a new change to how we are implementing this.

Both for level and relabeling parameters, we started by thinking that we would set them at the per-I/O-item level, and then we ended up setting them "globally" (i.e. at the task level). Should we do the same for the expected_dimensions=2/3 parameter?

This would mean that a workflow either works on 2D image/label arrays, or on 3D image/label arrays (note that dataframes don't have a dimensionality). Everything is always written to disk as 3D (the zarr structure we support, at the moment), but the global parameter expected_dimensions is used to go through the several branches of #149 (comment).

This would obviously simplify things a bit.

@jluethi: any feedback on this? Should we (already) support mixed-dimensions workflows?

@jluethi
Copy link
Collaborator

jluethi commented Nov 7, 2022

This would mean that a workflow either works on 2D image/label arrays, or on 3D image/label arrays

Hmm, a bit of a tricky question. expected_dimensions is only used for inputs at the moment, right? In our current setup, we have 2D Zarrs & 3D Zarrs and we'd never mix them. We want to eventually move to a point where we can e.g. use a 2D organoid segmentation to get the ROI for an organoid in x & y, then do 3D cell segmentation within that. In that case, we may need to load the 2D label image and the 3D intensity image. But maybe we also switch this to a way where we just load ROI information? Not fully sure yet.

=> The simplification would make sense for the current setup, but we may eventually need this complexity in some edge-cases

@tcompa
Copy link
Collaborator Author

tcompa commented Nov 7, 2022

expected_dimensions is only used for inputs at the moment, right?
No, see #149

In our current setup, we have 2D Zarrs & 3D Zarrs and we'd never mix them. We want to eventually move to a point where we can e.g. use a 2D organoid segmentation to get the ROI for an organoid in x & y, then do 3D cell segmentation within that. In that case, we may need to load the 2D label image and the 3D intensity image. But maybe we also switch this to a way where we just load ROI information? Not fully sure yet.
=> The simplification would make sense for the current setup, but we may eventually need this complexity in some edge-cases

Ok, thanks.
Since we don't have full understanding of the "complex" use case at the moment (i.e. whether to start by loading arrays or ROIs), I am putting in the simplified version for now. Let's open an issue with the intended "complex" use case and continue over there.

@jluethi
Copy link
Collaborator

jluethi commented Nov 7, 2022

Ok, let's see if it comes up as a limitation when we process mixed 2D vs 3D workflows then :)

Repository owner moved this from TODO to Done in Fractal Project Management Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants