Skip to content

Parsing metadata from Yokogawa experiments #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gusqgm opened this issue May 19, 2022 · 9 comments
Closed

Parsing metadata from Yokogawa experiments #25

gusqgm opened this issue May 19, 2022 · 9 comments
Assignees

Comments

@gusqgm
Copy link

gusqgm commented May 19, 2022

Any experiments coming from the Yokogawa microscope generate a couple of metadata files (with .mrf and .mlf extensions).

You can find information on how this is being parsed by Drogon here. The corresponding empty dataframes for each meta file are created just before this line.

Proper parsing of metadata is important not only for being able to potentially save it (within?/alongisde?) .zarr output for further reference but also for processing steps such as #8 where we need the positional information of each recorded field.

@jluethi
Copy link
Collaborator

jluethi commented May 19, 2022

Thanks for the clear description @gusqgm!
Does some of your test data already contain fitting metadata files? If so, which one would that be? My test datasets currently don't contain the metadata files generated by the microscope. I do think both parsing from filenames and parsing from the metadata should be supported use cases.

Otherwise, let's make a test dataset and mention it here, so we know what to test this parsing on.

@gusqgm
Copy link
Author

gusqgm commented May 23, 2022

Hey @jluethi , all of the data from FMI currently contains metadata associated with it within the folders.
Currently am looking at how to manipulate the meta files so that a minimal test example + associated metadata can be created.

@jluethi
Copy link
Collaborator

jluethi commented Jun 17, 2022

I added a new task metadata_parsing: b98278648227709e1bdda74c8e17392bf28c79fc & 04266345e1dff319901819013e6a240a727e5ec2

The parse_yokogawa_metadata function takes the paths of the two metadata files, parses them and processes the information about all the images of the acquisition. It then generates a few aggregated readouts. The parsing is based on the yokogawa_image_collection_task in Drogon written by Dario. I then added some aggregation on top that produces the following:

  1. A top level print statement (how do we want to handle logging?) about errors observed in the metadata files from the microscope (sometimes, the microscope fails to acquire a target the user initially specified, more on this in another issue) and some warnings.warn messages about each error
  2. A total_files count based on the metadata. We could use this one and check against the number of image files. It won't always hold (e.g. test cases with different metadata, complicated use-cases we sometimes had where we manually moved some images away after imaging that shouldn't be processed). But most of the time in real use-cases, they should match and a warning would be good if they don't.
  3. The core of the metadata: The relevant information we will need for each site to be parsed into the ome zarr metadata as a table. This table looks like this:

Screenshot 2022-06-17 at 17 42 29

The table is double-indexed, by well & field_id (= field of view, e.g. field_id 1 is the file that contains F001 in its filename). It contains 3 types of information:
a) The bit_depth of the images: Typically 16 bit and that's also what we have in all test cases. bit_depth determines the min & max values in the omero window metadata. 16 bit => min: 0, max: 2^16 -1 = 65535

"omero":
      {
        "id": 1,
        "name": "example.tif",
        "version": "0.3",
        "channels": [
            {
                "active": true,
                "coefficient": 1,
                "color": "00FFFF",
                "family": "linear",
                "inverted": false,
                "label": "DAPI",
                "window": {
                    "end": 600,
                    "max": 65535,       <= bit_depth
                    "min": 0,           <= bit_depth
                    "start": 110
                }
            },

b) The scale information about pixel sizes in x, y & z in micrometers. In the case of the table above, the value for each site would be "scale": [1.0, 5.0, 0.325, 0.325] (first value is for channel, always 1)
c) The information about positioning of sites in space (see fractal-analytics-platform/fractal-client#66 on why it's important that we parse this metadata). For the first site of the table above, the correct values would thus be "translation": [0.0, 0.0, -1353.6, -781.6] (channel & z are 0. I think napari will be able to handle negative values (e.g. the center being defined in the middle of the well). But if it doesn't, we can adapt the parsing to shift everything so that values start at 0 in each dimension)

{
    "multiscales": [
        {
            "axes": [
                {
                    "name": "c",
                    "type": "channel"
                },
                {
                    "name": "z",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "y",
                    "type": "space"
                },
                {
                    "name": "x",
                    "type": "space"
                }
            ],
            "datasets": [
                {
                    "path": "0",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 0.1625, 0.1625]               <= scale info goes here, describing the pixel size of the lowest pyramid level
                    },
                    {
                        "type": "translation",
                        "translation": [0.0, 0.0, 0.0, 416.0]           <= translation information goes here
                    }]
                },
                {
                    "path": "1",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 0.4875, 0.4875]               <= scale info at higher pyramid levels need to be calculated based on the coarsening factor
                    }]
                },
                {
                    "path": "2",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 1.4625, 1.4625]
                    }]
                },
                {
                    "path": "3",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 4.3875, 4.3875]
                    }]
                },
                {
                    "path": "4",
                    "coordinateTransformations": [{
                        "type": "scale",
                        "scale": [1.0, 1.0, 13.1625, 13.1625]
                    }]
                }
            ],
            "version": "0.4"
        }
    ]
}

The parsing performs quite a few checks for consistencies, e.g. bit_depth could be 8 or 16, but needs to be the same within all images of a well (likely always will be 16 bit for all the yokogawa images). x & y positions need to be the same for all images of a given site (otherwise something went seriously wrong), pixel sizes need to be consistent etc.

Even with all this, the parsing still runs in under a minute to process the metadata for my 23 well example that consists of 165792 image records (slightly fewer images in the test case, long story). So looks to me like it should scale pretty well for our use-cases :)

@jluethi
Copy link
Collaborator

jluethi commented Jun 17, 2022

@tcompa Let's talk about how we best integrate the parsing functionality into the OME-Zarr creation workflow. Probably easiest to have a quick Zoom call on this, e.g. on Monday or as part of the Fractal call on Tuesday morning.

A few notes here from my side:
The goal of parsing the metadata into a concise, aggregated pandas dataframe is that all Yokogawa to OME-Zarr parsing could go via such a dataframe, independent of whether we have metadata. I see 3 ways users could get the necessary information into this dataframe:

  1. They have the MeasurementData.mlf and MeasurementDetail.mrf, it's being parsed by the parsing function I just added.
  2. They create their own pandas table that follows the same specification (very useful to allow users with special cases a path into Fractal)
  3. The have a somewhat simple case of full rectangular wells (i.e. the test cases we've been building so far). Based on very few parameters (pixel sizes, bit depth, grid dimensions & image sizes), we can calculate all the metadata parameters for this simple case

Thus, if we integrate the aggregation pandas table with the site metadata into the workflow and write a little wrapper for the simple case, we should support all of the core use cases of Yokogawa data acquisition we're currently targeting for parsing into OME-Zarr: Simple, full wells, sparsely recorded fields on a grid as well as "search-first, gridless" images arranged at random positions in a well (see here: https://github.com/fractal-analytics-platform/mwe_fractal/issues/23)

Also, this kind of intermediate pandas dataframe may make it easier to integrate data from other microscopes (file names also change, but we have a target of how we need to prepare the metadata for it to be read in)

@jluethi
Copy link
Collaborator

jluethi commented Jun 17, 2022

I also added MeasurementData.mlf & MeasurementDetail.mrf files to 2 of the UZH test cases so that we can also test it with those:
The 2x2 test case has metadata that fits: data/active/fractal/3D/PelkmansLab/CardiacMultiplexing/Cycle1_testSubset
And the single well, 9x8 test case as well: data/active/fractal/3D/PelkmansLab/CardiacMultiplexing/Cycle1_9x8_singleWell

@tcompa
Copy link
Collaborator

tcompa commented Jun 21, 2022

Note: this issue is currently on hold, waiting for the single-FOV vs multi-FOV decision (https://github.com/fractal-analytics-platform/mwe_fractal/issues/74).

@jluethi
Copy link
Collaborator

jluethi commented Jul 21, 2022

The updated table looks like this:
Screenshot 2022-07-21 at 17 11 33

=> added columns for x, y & z pixel, thus the metadata now also specifies the dimensions in pixels for each field of view, which is required to specify regions of interest.

@jluethi
Copy link
Collaborator

jluethi commented Aug 3, 2022

The metadata table can optionally contain a time column. Objects in there are pandas.Timestamp times.
20220803_metadata_table

@jluethi jluethi transferred this issue from fractal-analytics-platform/fractal-client Sep 2, 2022
@jluethi jluethi moved this from Done to Done Archive in Fractal Project Management Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants