-
Notifications
You must be signed in to change notification settings - Fork 1
OME-Zarr parsing has non-deterministic assignments of channels per well #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As or our last call today: With 6d176a7, I only scratched the issue of inconsistent channel assignment. To be honest I don't even know if it is enough, but anyway it is obviously not robust. By now, let's proceed this way:
ExampleThe user provides this table
Images in a given well have filenames ending with Notes:
|
(@jluethi: no need that you run the 23-well example with the current code, since there's a lot of things that will have to change anyway) |
Quick comment upon thinking about this: As soon as we allow a well to have channels This means that we should call Also, we should modify the pyramid creation to work with 3D arrays (which is a trivial change in #53), and obviously the The other option is that we keep the current behavior, where Briefly, the main options are
IMPORTANT: This is quite a relevant change, so I'll wait for some feedback before proceeding. |
This is a very important discussion @tcompa , you are right.
This is tricky. At least the current implementation of the ome-zarr-reader assumes that the folder names (e.g. {0,1,2}) are consistent between wells and it will put {0] of one well into the same napari layer as {0} of a different well. I think it does get a bit simpler though, because we don't really care about the C01, C03, C04 part anymore if they are labeled and parsed consistently. Here's my vision of this
Before parsing, we check through all the channels and decide a fixed mapping
The Zarr file then always contains the channel under these folder numbers, every folder has these channels (lowest level of the folder tree displayed here:
Both the name of the channel (e.g. DAPI) and the C01 (plus additional info) are saved into the OMERO metadata per field of view. The number like C01 probably doesn't need to be accessed much (maybe for some fancy inference of what channel it is for illumination correction, but the user can also look that up based on the label (e.g. DAPI) they gave and there original label table. As far as Fractal is concerned, it is handling channel 0, channel 1 or channel 2, as saved in Zarr. As long as the A01_C01 that the user names DAPI is always put into channel 0 during the initial parsing, we just rely on this from there on now. Hope this explanation helps. If not, let's quickly have a zoom call (I'd be available just now or at 3pm again) :) |
Bit of added complexity: What if there are more channels? e.g. multiplexing, there is DAPI in every cycle. Or some channels are not imaged in all wells? There should in my opinion still be a definitive mapping of channel inputs (e.g. A04_C05 in case of a new channel) to an index in Zarr, e.g. now A04_C05 => 3. And Zarr only creates the folders for which channels exist (would need to test with the reader plugin if that can handle missing channels per well). If there was multiplexing done, the user may have 3 folders, each of them containing
In that case, we could map something like this:
|
Handling missing channels is a bit of an optional use case, let's see if it's easy, otherwise we do it later |
After a quick Zoom call with @jluethi, here's a scheme for the first implementation. Assumptions:
Scheme:
Note that, at the moment, the pyramid-creation function remains the same (acting on 4D arrays), since it doesn't care about the channel names/indices. |
I think mulitple calls to See this example:
Option 1 works, of course, but it neglects the actual channel indices. I think the standard specs require us to keep the |
Thanks for the explanation @tcompa. Idea behind the wrap up task: |
Going through part of the parsing, a wrap-up metadata parsing task may actually not be feasible, because some metadata needs to be related to the original filenames, thus probably needs to be parsed together with the file reading to be able to match them properly. |
Quick information: if channel subfolders are not like |
Ok. I'd suggest we implement the case were we have continuous channels 0 to n for now. And then worry about missing channels in a well if that use case arises :) |
I'll start working on this, in dev-channels. |
A first version is almost ready in dev-channels (see commits in https://github.com/fractal-analytics-platform/mwe_fractal/commits/dev-channels). At the moment I only updated:
@jluethi, could you provide a set of realistic labels?
|
Hey @tcompa Here is some example omero metadata I wrote for this dataset:
You can take over just the subset of the information that is relevant here, e.g. just the The matching would be: |
Note to future self: |
Channel labels (written together with the other omero metadata) are read correctly within napari. That's a good check. The next step I have on the list is to move from hard-coded channel information to something user-provided. And then a couple more tasks need to be updated to the new channel scheme. |
(this is all under the no-missing-channel assumption) |
On the Pelkmans lab side, we always worked under a "no missing channels" assumption in our processing in the past, so while we could think about making this more flexible at some point, it wouldn't be a step back for us. Thus, I would suggest we accept the "no missing channels" limitation and maybe even include a check for it for the moment such that the parsing would complain if channels are missing. If we ever need to support those use cases later, we can come back to this, but isn't a priority goal from my side. |
Update on the current status: The user should provide a dictionary like this:
At the moment this dictionary should be complete, but then we can obviously set some default choices for
The first task (
Let's not focus on the structure (first column is probably redundant, and maybe list of lists can be simplified), but on the fact that this table contains the list of all channels which are present in the dataset. The assumption is that they are also present for each well, and this is explicitly verified (throwing an exception otherwise). From now on, all tasks receive Channels are always sorted according to this order, in all wells, and upon writing the zarr files subfolders are named 0,1,2,... A few things still need to be streamlined, but this implementation is almost ready. Does such a scheme seem reasonable? Comments are welcome :) |
The scheme looks very reasonable to me for parsing the data in.
That's great! The only thing I'm not sure about is:
This information is part of the Zarr file once it's written, no? So there would be no need to pass it again and find places to save it in between (/ read it from file again).
Great! Making arguments optional eventually is a good idea, because we can e.g. create a task that infers |
Thanks. Then I'll move on and finalize it as much as possible, aiming for pushing to
Agreed, this structure is highly redundant (it was the very first try). What is clearly not needed is:
At the moment we still need the sorted list Two questions:
|
How about for the moment, both the initialization task and the
Hmm, I don't know how the omero part deals with custom metadata keys, but that is another place we could put it. Not 100% sure how we'd save this information (i.e. A01_C01 may not be specific enough for all downstream uses of channel information) => could be a start, but not yet complete.
That is actually slightly more complicated. A01_C01 describes a specific channel for a specific experiment, but depending on some microscope settings & user inputs, A01_C01 in one experiment is not the same wavelength of light & filter set at the microscope as A01_C01 in a second experiment. So we can't directly match on that, we will need additional metadata for direct matching. |
Current status (as of b3fd38e) What is there:
What is not there:
For me this version is good enough to be merged into |
Deterministic channel assignment across wells/plates [ref #61]
With 386569c, a minimal working example is available in |
@tcompa I ran the 2x2 test case and it ran through successfully and added the Omero metadata. I then tried the 10wel, 5x5 site test case. It also runs through, but doesn't save any omero metadata to the .zattrs. Any idea what could be going wrong? I gave them the same channels.json file, so that shouldn't make a difference. I have pushed it to main (there was no |
Thanks for testing! It should work now. EDIT: here's the final MIP, with napari correctly reading channel labels: |
@tcompa Yes, also works on my side now. Really cool to see the channels parsed this way and rescaled nicely from the start! We can refactor later to decide how we pass channel lists in detail, how we handle multiplexing with this and refactor how the inputs are originally provided (i.e. do we stay with one json file per task or start with a combined json file setup) |
This was actually much simpler than we thought back then, see import json
d = {1: True}
with open("/tmp/out.json", "w") as f:
json.dump(d, f) leads to $ cat /tmp/out.json
{"1": true} We were probably confused by other things, since this is actually trivial (as it should be). |
dev to main (require fractal-tasks-core 0.1.4)
The current OME-Zarr parsing has no fixed mapping between the channel name in the original filename (or any channel metadata we may want to include) and the channel number it assigns when parsed to an OME-Zarr file. That leads to the bug where images assigned to channel 1 in well B03 may be from a different channel than images assigned to channel 1 in well C06.
Because channels are displayed as layers in napari, that then leads to a layer containing different information depending on the well, which we need to fix.


See here: The same channel is DAPI (nuclear marker) vs. Lamin B1 (nuclear envelope marker), depending on the well in the 23 well dataset (=> one looks like solid, round objects, the other is more pronounced at the rim of the object).
What we would like to achieve:
The text was updated successfully, but these errors were encountered: