Parsing metadata from Yokogawa experiments #25

gusqgm · 2022-05-19T08:30:44Z

Any experiments coming from the Yokogawa microscope generate a couple of metadata files (with .mrf and .mlf extensions).

You can find information on how this is being parsed by Drogon here. The corresponding empty dataframes for each meta file are created just before this line.

Proper parsing of metadata is important not only for being able to potentially save it (within?/alongisde?) .zarr output for further reference but also for processing steps such as #8 where we need the positional information of each recorded field.

The text was updated successfully, but these errors were encountered:

jluethi · 2022-05-19T08:44:13Z

Thanks for the clear description @gusqgm!
Does some of your test data already contain fitting metadata files? If so, which one would that be? My test datasets currently don't contain the metadata files generated by the microscope. I do think both parsing from filenames and parsing from the metadata should be supported use cases.

Otherwise, let's make a test dataset and mention it here, so we know what to test this parsing on.

gusqgm · 2022-05-23T10:48:38Z

Hey @jluethi , all of the data from FMI currently contains metadata associated with it within the folders.
Currently am looking at how to manipulate the meta files so that a minimal test example + associated metadata can be created.

jluethi · 2022-06-17T15:56:36Z

I added a new task metadata_parsing: b98278648227709e1bdda74c8e17392bf28c79fc & 04266345e1dff319901819013e6a240a727e5ec2

The parse_yokogawa_metadata function takes the paths of the two metadata files, parses them and processes the information about all the images of the acquisition. It then generates a few aggregated readouts. The parsing is based on the yokogawa_image_collection_task in Drogon written by Dario. I then added some aggregation on top that produces the following:

A top level print statement (how do we want to handle logging?) about errors observed in the metadata files from the microscope (sometimes, the microscope fails to acquire a target the user initially specified, more on this in another issue) and some warnings.warn messages about each error
A total_files count based on the metadata. We could use this one and check against the number of image files. It won't always hold (e.g. test cases with different metadata, complicated use-cases we sometimes had where we manually moved some images away after imaging that shouldn't be processed). But most of the time in real use-cases, they should match and a warning would be good if they don't.
The core of the metadata: The relevant information we will need for each site to be parsed into the ome zarr metadata as a table. This table looks like this:

The table is double-indexed, by well & field_id (= field of view, e.g. field_id 1 is the file that contains F001 in its filename). It contains 3 types of information:
a) The bit_depth of the images: Typically 16 bit and that's also what we have in all test cases. bit_depth determines the min & max values in the omero window metadata. 16 bit => min: 0, max: 2^16 -1 = 65535

"omero":
      {
        "id": 1,
        "name": "example.tif",
        "version": "0.3",
        "channels": [
            {
                "active": true,
                "coefficient": 1,
                "color": "00FFFF",
                "family": "linear",
                "inverted": false,
                "label": "DAPI",
                "window": {
                    "end": 600,
                    "max": 65535,       <= bit_depth
                    "min": 0,           <= bit_depth
                    "start": 110
                }
            },

b) The scale information about pixel sizes in x, y & z in micrometers. In the case of the table above, the value for each site would be "scale": [1.0, 5.0, 0.325, 0.325] (first value is for channel, always 1)
c) The information about positioning of sites in space (see fractal-analytics-platform/fractal-client#66 on why it's important that we parse this metadata). For the first site of the table above, the correct values would thus be "translation": [0.0, 0.0, -1353.6, -781.6] (channel & z are 0. I think napari will be able to handle negative values (e.g. the center being defined in the middle of the well). But if it doesn't, we can adapt the parsing to shift everything so that values start at 0 in each dimension)

{
    "multiscales": [
        {
            "axes": [
                {
                    "name": "c",
                    "type": "channel"
                },
                {
                    "name": "z",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "y",
                    "type": "space"
                },
                {
                    "name": "x",
                    "type": "space"
                }
            ],
            "datasets": [
                {
                    "path": "0",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 0.1625, 0.1625]               <= scale info goes here, describing the pixel size of the lowest pyramid level
                    },
                    {
                        "type": "translation",
                        "translation": [0.0, 0.0, 0.0, 416.0]           <= translation information goes here
                    }]
                },
                {
                    "path": "1",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 0.4875, 0.4875]               <= scale info at higher pyramid levels need to be calculated based on the coarsening factor
                    }]
                },
                {
                    "path": "2",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 1.4625, 1.4625]
                    }]
                },
                {
                    "path": "3",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 4.3875, 4.3875]
                    }]
                },
                {
                    "path": "4",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 13.1625, 13.1625]
                    }]
                }
            ],
            "version": "0.4"
        }
    ]
}

The parsing performs quite a few checks for consistencies, e.g. bit_depth could be 8 or 16, but needs to be the same within all images of a well (likely always will be 16 bit for all the yokogawa images). x & y positions need to be the same for all images of a given site (otherwise something went seriously wrong), pixel sizes need to be consistent etc.

Even with all this, the parsing still runs in under a minute to process the metadata for my 23 well example that consists of 165792 image records (slightly fewer images in the test case, long story). So looks to me like it should scale pretty well for our use-cases :)

jluethi · 2022-06-17T16:07:58Z

@tcompa Let's talk about how we best integrate the parsing functionality into the OME-Zarr creation workflow. Probably easiest to have a quick Zoom call on this, e.g. on Monday or as part of the Fractal call on Tuesday morning.

A few notes here from my side:
The goal of parsing the metadata into a concise, aggregated pandas dataframe is that all Yokogawa to OME-Zarr parsing could go via such a dataframe, independent of whether we have metadata. I see 3 ways users could get the necessary information into this dataframe:

They have the MeasurementData.mlf and MeasurementDetail.mrf, it's being parsed by the parsing function I just added.
They create their own pandas table that follows the same specification (very useful to allow users with special cases a path into Fractal)
The have a somewhat simple case of full rectangular wells (i.e. the test cases we've been building so far). Based on very few parameters (pixel sizes, bit depth, grid dimensions & image sizes), we can calculate all the metadata parameters for this simple case

Thus, if we integrate the aggregation pandas table with the site metadata into the workflow and write a little wrapper for the simple case, we should support all of the core use cases of Yokogawa data acquisition we're currently targeting for parsing into OME-Zarr: Simple, full wells, sparsely recorded fields on a grid as well as "search-first, gridless" images arranged at random positions in a well (see here: https://github.com/fractal-analytics-platform/mwe_fractal/issues/23)

Also, this kind of intermediate pandas dataframe may make it easier to integrate data from other microscopes (file names also change, but we have a target of how we need to prepare the metadata for it to be read in)

jluethi · 2022-06-17T16:33:15Z

I also added MeasurementData.mlf & MeasurementDetail.mrf files to 2 of the UZH test cases so that we can also test it with those:
The 2x2 test case has metadata that fits: data/active/fractal/3D/PelkmansLab/CardiacMultiplexing/Cycle1_testSubset
And the single well, 9x8 test case as well: data/active/fractal/3D/PelkmansLab/CardiacMultiplexing/Cycle1_9x8_singleWell

tcompa · 2022-06-21T09:04:00Z

Note: this issue is currently on hold, waiting for the single-FOV vs multi-FOV decision (https://github.com/fractal-analytics-platform/mwe_fractal/issues/74).

tcompa · 2022-07-20T09:44:50Z

Closing this, moving to

jluethi · 2022-07-21T15:17:40Z

The updated table looks like this:

=> added columns for x, y & z pixel, thus the metadata now also specifies the dimensions in pixels for each field of view, which is required to specify regions of interest.

jluethi · 2022-08-03T11:03:02Z

The metadata table can optionally contain a time column. Objects in there are pandas.Timestamp times.

jluethi mentioned this issue May 24, 2022

Parse additional yokogawa_to_zarr parameters to the command line interface fractal-analytics-platform/fractal-client#49

Closed

jluethi mentioned this issue Jun 8, 2022

Create core feature measurements fractal-analytics-platform/fractal-client#69

Closed

gusqgm mentioned this issue Jun 13, 2022

OME-Zarr parsing for sparsely recorded fields #8

Closed

jluethi mentioned this issue Jun 16, 2022

Lazy loading of wells and multi-site plates fractal-analytics-platform/fractal-client#66

Closed

jluethi mentioned this issue Jun 21, 2022

Increase metadata parsing to OME-Zarr & update to v0.4 (or v0.5) ome-ngff standard fractal-analytics-platform/fractal-client#44

Closed

jluethi added this to the Support Yokogawa Search-First Modality milestone Jun 27, 2022

jluethi mentioned this issue Sep 2, 2022

Field of view parallelization via OME-NGFF ROI tables #24

Closed

jluethi self-assigned this Jul 12, 2022

jluethi mentioned this issue Jul 15, 2022

Image labeling at whole-well level fractal-analytics-platform/fractal-client#104

Closed

tcompa closed this as completed Jul 20, 2022

jluethi removed this from the Support Yokogawa Search-First Modality milestone Jul 20, 2022

This was referenced Sep 2, 2022

Parsing Yokogawa data with pandas table as metadata inputs #14

Closed

Remove rows & cols parameter from yokogawa to Zarr #13

Closed

jluethi transferred this issue from fractal-analytics-platform/fractal-client Sep 2, 2022

jluethi moved this from Done to Done Archive in Fractal Project Management Oct 5, 2022

jluethi mentioned this issue Jun 9, 2023

Making coordinateTransformations flexible for axes & comply with spec (include channels transformation) #420

Closed

2 tasks

jluethi mentioned this issue Sep 19, 2023

Set ROI origin to match well origin #524

Merged

8 tasks

jluethi removed this from Fractal Project Management Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parsing metadata from Yokogawa experiments #25

Parsing metadata from Yokogawa experiments #25

gusqgm commented May 19, 2022

jluethi commented May 19, 2022

Uh oh!

gusqgm commented May 23, 2022

Uh oh!

jluethi commented Jun 17, 2022 •

edited

Loading

Uh oh!

jluethi commented Jun 17, 2022

Uh oh!

jluethi commented Jun 17, 2022

Uh oh!

tcompa commented Jun 21, 2022

Uh oh!

tcompa commented Jul 20, 2022

Uh oh!

jluethi commented Jul 21, 2022

Uh oh!

jluethi commented Aug 3, 2022 •

edited

Loading

Uh oh!

Parsing metadata from Yokogawa experiments #25

Parsing metadata from Yokogawa experiments #25

Comments

gusqgm commented May 19, 2022

jluethi commented May 19, 2022

Uh oh!

gusqgm commented May 23, 2022

Uh oh!

jluethi commented Jun 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jluethi commented Jun 17, 2022

Uh oh!

jluethi commented Jun 17, 2022

Uh oh!

tcompa commented Jun 21, 2022

Uh oh!

tcompa commented Jul 20, 2022

Uh oh!

jluethi commented Jul 21, 2022

Uh oh!

jluethi commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jluethi commented Jun 17, 2022 •

edited

Loading

jluethi commented Aug 3, 2022 •

edited

Loading