Skip to content

OME-Zarr parsing for sparsely recorded fields #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gusqgm opened this issue Apr 22, 2022 · 20 comments
Closed

OME-Zarr parsing for sparsely recorded fields #8

gusqgm opened this issue Apr 22, 2022 · 20 comments
Assignees
Labels
enhancement New feature or request

Comments

@gusqgm
Copy link

gusqgm commented Apr 22, 2022

Can .zarr containers contain only the data present in the fields that actually contain data within a well? This would require the parsing of the Yokogawa experiment to be able to extract (z,y,x) stage positions, assigning corresponding values to corresponding fields.

As a first step, we need to parse the x,y,width and height from the metadata file from yokogawa experiments, and one example is found in the yokogawa_image_collection_task from the Liberali Workflows repository. All metadata is saved in the so-called mlf_frame file

On the simplest case, the sparse fields would still fall onto a grid pattern, so parsing should then be able to extrapolate the size of the entire grid, and be able to place the recorded fields in the correct grid positions.

Currently Drogon creates an overview of 0's where fields were not recorded, filling up the gaps. The saved overviews also comprise of a mask overview which is used for retaining only the imaged fields for processing and avoid overfilling memory.

One interesting question here is whether OME-Zarr can actually deal with sparse field grids, meaning that the empty fields would remain empty, or would be assigned to same-sized stacks filled with 0's.

@gusqgm gusqgm added enhancement New feature or request question labels Apr 22, 2022
@gusqgm gusqgm changed the title OME parsing for sparsely recorded field grids OME-ZARR parsing for sparsely recorded field grids Apr 22, 2022
@gusqgm gusqgm changed the title OME-ZARR parsing for sparsely recorded field grids OME-Zarr parsing for sparsely recorded field grids Apr 22, 2022
@gusqgm gusqgm removed the question label Apr 22, 2022
@gusqgm
Copy link
Author

gusqgm commented Apr 22, 2022

In the case of field grids, basically we need, in case fields are sparse area in between should be comprised of regions with no information, to be able to provide correct overview estimation.
This means that we need to:

  1. parse (z,y,x) field position as coming from stage position of the microscope from metadata and use it for creation of sparse grid
  2. empty regions are created as spacers for the data (does .zarr allow for sparse image writing/reading?)
  3. grid is generated naming the fields as matrix, field grid position stored as metadata in .zarr -> only recorded images are saved on disk, no extra data is created for the spacers.

@gusqgm gusqgm changed the title OME-Zarr parsing for sparsely recorded field grids OME-Zarr parsing for sparsely recorded fields Apr 25, 2022
@gusqgm
Copy link
Author

gusqgm commented Apr 25, 2022

For gridless field acquisitions, the microscope tries to center the fields around the samples, and currently we keep also a mask so that we retrieve the fields from non-imaged areas, as seen in this example:

220314EMS005_220317_161558_E02_T0001F001L01A02Z01C02_COMBINED

We can possibly just adapt the current Liberali Workflow that deals with parsing the stage positions onto creation of overviews.

@jluethi
Copy link
Collaborator

jluethi commented May 9, 2022

High level description from @gusqgm (moved into the issue for clarity):

Most experiments from the Liberali Lab deal with sparse data, i.e. samples are scattered throughout space.

If this is on a grid, during image acquisition in the microscope, fields that do not contain samples matching certain criteria are not imaged, so that in the end we have a sparsely acquired field grid.
For grid-less sparse acquisitions, fields are acquired centered on particular samples without following a grid pattern.
We need to be able to parse this data properly so all imaged fields are still assigned to their correct grid position during the parsing and the saving into the zarr container.

@jluethi
Copy link
Collaborator

jluethi commented May 18, 2022

@gusqgm Do we have a test set of this kind of data? Is this what's on the sftp share at the moment?
If so, @mfranzon is it already on the Fractal share?
Let's document here where the test data is, as soon as it's on the share :)

@gusqgm
Copy link
Author

gusqgm commented May 19, 2022

@jluethi, dataset called 20220316_sec_FOCM_test-R1_E2 within gridless_Yokogawa_recording_FMI has been uploaded and @mfranzon has been performing copying to UZH server.
It contains well E02 which is represented in the image above. The raw data is present in the TIF folder, already in compressed form.

@gusqgm
Copy link
Author

gusqgm commented May 20, 2022

Regarding the microscope acquisition:

the image above CAN have overlapping fov's...
Which brings me the following thought: we could have an identifier for each fov, where we record the positional information of each one in the metadata, and on top add a flag to say which other fov id's might be overlapping with this one. Would this be a viable option?

@gusqgm
Copy link
Author

gusqgm commented Jun 13, 2022

Hi all, here is an example of a Search First experiment where the microscope acquires a sparse regular grid:

A06-overview_MIP_labeled

As you can see, there are 15 FOVs where 4 have not been recorded.

Important: To make this image I have simply opened plane number 10 of each of the recorded fields and assembled them according to the Overview created by our luigi overview workflow. Now here comes the catch: I have added on top of the image the corresponding part of the file name associated with the FOV number, and, as you can see, we have an issue with the "...F006...", "...F007...", "...F008..." images, as they should have been shifted by 1 in order to fit the expected zig-zag naming pattern.

This is an important finding, as it means that we simply cannot rely on the file naming during parsing of the microscope data, at least for all of the sparse grid imaging experiments. Rather, we need to address the microscope metadata (so this directly combines with #25 )

I am right now checking on the code for the image collection task and the overview creation task to check on details, and will write below on more details.

Anyways, for now the data has been added to the sftp server, details being sent now via email to you @mfranzon .

@gusqgm
Copy link
Author

gusqgm commented Jun 13, 2022

Here some brief information about the test data shown above:

There are 3 subfolders :

220304_172545_220304_172605
220304_172545_220304_172605_segmentation
220304_172545_220304_175557

The two first ones correspond to the usual output coming from the search first part of the acquisition, and should not be considered initially. This should be easy, as they have the same time-fingerprint.
The last folder contains all of the imaging data and the associated metadata in one place.

Please note that all folders reflect a dataset where the barcoding reader failed and a date-name was added in it, following issue fractal-analytics-platform/fractal-client#48 .

I have adapted the .mlf metadata to represent the single well dataset, and this runs within Drogon, so should be parsed correctly currently.

Please not that all of the other metadata are for the time being kept unchanged, and are still pointing to the full plate information.

@tcompa
Copy link
Collaborator

tcompa commented Jun 14, 2022

Thanks @gusqgm for these details. Quick question:

I am right now checking on the code for the image collection task and the overview creation task to check on details, and will write below on more details.

Are we supposed to have access to those repositories?

@jluethi
Copy link
Collaborator

jluethi commented Jun 14, 2022

If you don't have access yet, I can go through this code today or tomorrow and put together the code for a minimal parsing example, such that the above dataset is parsed correctly. I'd than hand over the example so we wrap it into a Fractal function :)

But also, let's make sure you are added to the repositories so you can have access if needed in the future :)

@tcompa
Copy link
Collaborator

tcompa commented Jun 14, 2022

If you don't have access yet, I can go through this code today or tomorrow and put together the code for a minimal parsing example, such that the above dataset is parsed correctly. I'd than hand over the example so we wrap it into a Fractal function :)

My to-do list is not so thin at the moment, so I'd say take your time.

But also, let's make sure you are added to the repositories so you can have access if needed in the future :)

Sure, thanks.

@gusqgm
Copy link
Author

gusqgm commented Jun 14, 2022

Update from my side:
@tcompa you have been invited to the repository, my bad.

Regarding the implementation of correct metadata parsing and field assignments: @jluethi and I agreed to go through the current code in the aforementioned repository so that we can already filter out the most timportant parts and perform minimal tests for consistency check. Once this is done we would pass them on to you, as @jluethi mentioned.

@jluethi
Copy link
Collaborator

jluethi commented Jun 17, 2022

Update on the parsing of metadata (from the sparse example above as well as other Yokogawa metadata) in this issue: https://github.com/fractal-analytics-platform/mwe_fractal/issues/46

@jluethi
Copy link
Collaborator

jluethi commented Aug 29, 2022

Quick note: When we save sparse array, let's make sure we use the write_empty_chunks=False option and have a fill value of 0 (see here: https://zarr.readthedocs.io/en/stable/tutorial.html#empty-chunks)

@jluethi jluethi transferred this issue from fractal-analytics-platform/fractal-client Sep 2, 2022
@jluethi
Copy link
Collaborator

jluethi commented Sep 13, 2022

To facilitate tackling this issue:

  1. We need to fix Remove rows & cols parameter from yokogawa to Zarr #13 to use the metadata instead of row & col parameters (let's not throw away the row & col logic: It's useful to have, but should not be the default and should not be used when there is metadata available).
  2. Let's add to this to make sure it can also handle a search-first (grid-based): See Search-first 1) here Overview Test Datasets fractal-client#213
  3. Let's test it on a larger search first dataset: See Search-first 2)

Let's discuss 1) in #13

For 2)
Careful, this dataset has different channels. Until we have fixed #5, let's make sure we specify them manually
When the processing works correctly, it should look like this (see also above from @gusqgm ):
A06-overview_MIP_labeled

Still like a grid, but not all parts have content. We will need to ensure that we build an empty array of the correct size initially to be able to fill in all the positions. And we can't make assumptions based on the count of FOVs or just look at first and last (otherwise, we're not tackling part 3)). => We will need to look at the metadata table, find the top left and bottom right corner based on all the x & y position and the image sizes.

@tcompa
Copy link
Collaborator

tcompa commented Sep 15, 2022

We will need to look at the metadata table, find the top left and bottom right corner based on all the x & y position and the image sizes.

At the moment, this happens based on the FOV-ROI indices, rather than on the x/y physical positions. In the parsing task there is a block like

        adata = read_zarr(f"{zarrurl}/tables/FOV_ROI_table")
        fov_rois = convert_ROI_table_to_indices(adata, full_res_pxl_sizes_zyx=pxl_size)

        max_x = max(roi[5] for roi in fov_rois)
        max_y = max(roi[3] for roi in fov_rois)
        max_z = max(roi[1] for roi in fov_rois)
        [...]
        canvas = da.zeros((max_z, max_y, max_x), dtype=sample.dtype, chunks=(1, chunk_size_y, chunk_size_x))

where 5,3,1 are the indices corresponding to end_z,end_y,end_x, as in roi=[start_z, end_z, start_y, end_y, start_x, end_x]. Note that the max_z part is a bit redundant, as all FOV should have the same number of Z planes.

@jluethi, do you notice anything unexpected in the way we are doing it?

@jluethi
Copy link
Collaborator

jluethi commented Sep 15, 2022

I'd be a bit cautious with using an index-based selection of the columns here. It assumes that the ordering of the AnnData ROI tables will always be the same. While our system should be consistent here, we may want to support other OME-Zarr files that contain tables with the same information, but the order of columns shouldn't be a part of that spec
=> Can we switch to selecting columns based on their names in var? Also, can we implement that such that we have default column names, but those can be overwritten as an optional input? (because the spec for how those columns are named may change in the future).

It's good to have max_z, as it may vary between wells, see e.g. the test case /data/active/fractal/3D/PelkmansLab/CardiacMultiplexing/Cycle1_5x5_10wells (fractal-analytics-platform/fractal-client#213). Mostly, I expect FOVs within a well to be consistent, but I'm not even sure that will always be the case and is not required by the specification.

Given that it handles the 2x2 case, it seems to be able to handle negative coordinates. The 2x2 case has X positions like X="-1448.3", so that's good :)
The general canvas definition and max position finding looks good to me

tcompa added a commit that referenced this issue Sep 19, 2022
…g get_ROIs_bounding_box (ref #8) to lib_regions_of_interest
@tcompa
Copy link
Collaborator

tcompa commented Sep 19, 2022

As of 5c1410e, the new function get_ROIs_bounding_box by @mfranzon and me is meant to take care of this last comment.

Missing features:

  1. The column names can be overwritten, but at the moment this is not exposed to the user.
  2. By now we are not shifting ROI positions (to make them start from zero), because our AnnData tables are already shifted (that takes place in prepare_FOV_ROI_table). We probably need to be more general, if we want to read zarr files produced outside fractal, but I'd say we can postpone the implementation of this feature.

@jluethi
Copy link
Collaborator

jluethi commented Sep 19, 2022

Sounds good, made an issue here describing when we may need to work on the 0, 0, 0 assumption :)
#82

@jluethi
Copy link
Collaborator

jluethi commented Sep 27, 2022

The /data/active/fractal/Liberali/1_well_15_fields_20_planes_SF_w_errors/D10_R1/220304_172545_220304_175557 search first dataset is processed successfully with the current Fractal version:
Bildschirmfoto 2022-09-27 um 21 40 57

Bildschirmfoto 2022-09-27 um 21 40 40

The other 2 search first test cases appear to have some issues in the metadata parsing (#109 & #110), but that's most likely unrelated to the search first part of it.

I'll need to have a brief check whether the segmentation and measurements have worked correctly on this dataset. After that, we can close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants