Add read_vcfzarr #40

tomwhite · 2020-07-14T15:21:21Z

This implements @alimanfoo's suggestion at https://github.com/pystatgen/sgkit/issues/2#issuecomment-646138160 for reading VCF data via Zarr.

scikit-allel does not store phasing information, so the call/genotype_phased variable is not populated.
Variant IDs can be . in the VCF, meaning missing, so I added a variant/id_mask variable.
Variant alleles were documented to be of type S1, whereas they can be of any length as the example shows, so I've updated the docs. Also, the max length is computed as a part of loading, so we can assign the type (e.g. "S3" if the max allele length is 3).
I created test data as a zipped zarr file using scikit-allel that is checked into the test/data directory.
I've copied encode_array from sgkit-plink to this repo. It's also needed by sgkit-bgen, so should live here so it can be shared, but probably in a different module.
I haven't considered metadata for this PR.

alimanfoo · 2020-07-14T16:20:22Z

Hi @tomwhite, one thing to account for is that there are two ways that the scikit-allel-style zarrs can be laid out.

Grouped by contig

The main way that I lay them out (and recommend to others) is to group the data by contig. This means there is one group for each contig in the zarr, and then within each of those contig groups there are "variants" and "calldata" groups. In this case you'd expect to navigate to the arrays using paths like "/{contig}/variants/POS" and "/{contig}/calldata/GT".

E.g., try this:

import zarr
import fsspec
store = fsspec.get_mapper('gs://1000genomes-zarr/ALL.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes')
h1k_phase3 = zarr.open_consolidated(store=store)
print(h1k_phase3.tree())

Here's a truncated view of the output...

/
 ├── 1
 │   ├── calldata
 │   │   └── GT (6468094, 2504, 2) int8
 │   └── variants
 │       ├── AA (6468094,) object
 │       ├── AC (6468094, 12) int32
 │       ├── AF (6468094, 12) float32
 │       └── ... (more arrays)
 ├── 2
 │   ├── calldata
 │   │   └── GT (7081600, 2504, 2) int8
 │   └── variants
 │       ├── AA (7081600,) object
 │       ├── AC (7081600, 12) int32
 │       ├── AF (7081600, 12) float32
 │       └── ... (more arrays)
 ├── ... (more contig groups)
 └── samples
     ├── ID (2504,) object
     ├── gender (2504,) object
     ├── pop (2504,) object
     └── super_pop (2504,) object

Note that with this layout you never need to read the "/{contig}/variants/CHROM" arrays because the contig is implied by the grouping. Indeed the CHROM arrays may even be absent.

Ungrouped

Alternatively you can use scikit-allel to create a zarr that stores data from the whole genome together, i.e., there are no contig groups, and arrays contain data from the whole genome. In this case you would expect to navigate directly to the arrays, e.g., "/variants/POS" or "/calldata/GT".

With this layout you of course would need to read the "/{contig}/variants/CHROM" arrays to determine which contig for each variant.

Because I use the grouped-by-contig layout as the recommended layout, when I prototyped a vcfzarr_to_xarray function recently I added logic to handle this grouping by contig, and to concatenated the data from each group.

I'd suggest doing something similar here, i.e.:

Offering a parameter for the user to specify which layout is in use (e.g., grouped=True) which is set to whatever value means grouped-by-contig by default. (An alternative might be to automatically detect grouping by contig by looking for the presence of "variants" and "calldata" groups at the root of the hierarchy.)
Offering a parameter to allow the user to select which contigs to include (e.g., contigs=['1', '2']) with the default being all contigs. (We often analyse data from a single contig at a time, so including all contigs all the time is a bit unnecessary.)

alimanfoo · 2020-07-14T16:28:06Z

Just to add that there is another variation in layout that would be good to accommodate, which applies to how the sample information is stored.

Most of the time only the sample identifiers are stored, because this is all that is present in the source VCF. In this case I would expect to find an array with path "/samples" which is an array of sample identifiers. An example is the Ag1000G phase 2 data:

>>> store = fsspec.get_mapper('gs://ag1000g-release/phase2.AR1/variation/main/zarr/all/ag1000g.phase2.ar1')
>>> ag1000g_phase2 = zarr.open_consolidated(store=store)
>>> ag1000g_phase2['samples']
<zarr.core.Array '/samples' (1142,) object>

However, I'm also starting to add in more sample data, in which case the sample identifiers will be in a subgroup and accessed via a path like "/samples/ID". E.g., the human 1000 genomes phase 3:

>>> store = fsspec.get_mapper('gs://1000genomes-zarr/ALL.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes')
>>> h1k_phase3 = zarr.open_consolidated(store=store)
>>> h1k_phase3['samples'].tree()
samples
 ├── ID (2504,) object
 ├── gender (2504,) object
 ├── pop (2504,) object
 └── super_pop (2504,) object

So it would be good to add some logic to check whether "/samples" is an array or group, and deal with accordingly.

alimanfoo · 2020-07-14T16:49:55Z

sgkit/io/vcfzarr_reader.py

+def read_vcfzarr(path: PathType) -> xr.Dataset:
+    """Read a VCF Zarr file.
+
+    Loads VCF variant, sample, and genotype data as Dask arrays within a Dataset
+    from a Zarr file created using scikit-allel's `vcf_to_zarr` function.
+
+    Since `vcf_to_zarr` does not preserve phasing information, there is no
+    `call/genotype_phased` variable in the resulting dataset.
+
+    Parameters
+    ----------
+    path : PathType
+        Path to the Zarr file.
+
+    Returns
+    -------
+    xr.Dataset
+        The dataset of genotype calls, created using `create_genotype_call_dataset`.
+    """
+
+    vcfzarr = zarr.open_group(str(path), mode="r")


Suggested change

def read_vcfzarr(path: PathType) -> xr.Dataset:

"""Read a VCF Zarr file.

Loads VCF variant, sample, and genotype data as Dask arrays within a Dataset

from a Zarr file created using scikit-allel's `vcf_to_zarr` function.

Since `vcf_to_zarr` does not preserve phasing information, there is no

`call/genotype_phased` variable in the resulting dataset.

Parameters

----------

path : PathType

Path to the Zarr file.

Returns

-------

xr.Dataset

The dataset of genotype calls, created using `create_genotype_call_dataset`.

"""

vcfzarr = zarr.open_group(str(path), mode="r")

def read_vcfzarr(store: Union[str, Mapping], consolidated=False: bool) -> xr.Dataset:

"""Read data in Zarr format that was created using scikit-allel's `vcf_to_zarr` function.

Loads variant, sample, and genotype data as Dask arrays within a Dataset.

Since `vcf_to_zarr` does not preserve phasing information, there is no

`call/genotype_phased` variable in the resulting dataset.

Parameters

----------

store : Union[str, Mapping]

Either a string providing a URL or local file system path, or a store object.

consolidated : bool

Set True if the store uses consolidated metadata.

Returns

-------

xr.Dataset

The dataset of genotype calls, created using `create_genotype_call_dataset`.

"""

if isinstance(path, str):

# assume path is an fsspec-style url path

import fsspec

store = fsspec.get_mapper(path)

else:

# assume path is a store

store = path

if consolidated:

vcfzarr = zarr.open_consolidated(store=store, mode="r")

else:

vcfzarr = zarr.open(store=store, mode="r")

Two suggestions here.

First suggestion is to allow the first argument to be either a string or a mapping (dict-like) object. If a string then assume it's an fsspec-style URL and load via fsspec - this gives the ability to read from cloud object stores. If it's a mapping then treat it as a store - this gives full flexibility to pass in any type of zarr store.

Second suggestion is to expose a consolidated argument, which allows the user to specify if consolidated metadata has been used (this is an optimisation for cloud stores).

hammer · 2020-07-15T04:41:13Z

Hey @tomwhite how are you thinking about putting this into sgkit.io vs. sgkit-vcf?

Relatedly, do you imagine that sgkit will be able to serialize/deserialize data to/from Zarr in a way that will be agnostic to the file format from which the data originated, or will we always need to put a plink, bgen, or vcf prefix on these methods?

hammer · 2020-07-15T04:43:07Z

I've copied encode_array from sgkit-plink to this repo. It's also needed by sgkit-bgen, so should live here so it can be shared, but probably in a different module.

Yikes have we already hit a need for sgkit-common? We should keep an eye on whether the repo split is causing more work than the value it creates. It will be interesting to see if it's truly hard to install some of the third party IO libraries for users.

jeromekelleher · 2020-07-15T07:52:16Z

Yikes have we already hit a need for sgkit-common? We should keep an eye on whether the repo split is causing more work than the value it creates. It will be interesting to see if it's truly hard to install some of the third party IO libraries for users.

Surely this is just sgkit itself? I think this is what @tomwhite was suggesting above, that we create a sgkit.util or something module.

I'm starting to think the vcfzarr function should live in sgkit-vcf though. What's the advantage of putting it in sgkit vs sgkit-vcf? I think it would really help if all the code that knows about VCF structure lived in a single repo, and we convert to our own on-disk format. Conversion is a once off cost - we should be focusing on making our own disk format work really well for a whole range of tasks, not on how to make working with VCFs slightly less awful.

tomwhite · 2020-07-15T08:01:56Z

@alimanfoo thanks for the suggestions about variations in the Zarr layout that we should support. I assume they can all be generated using scikit-allel's vcf_to_zarr. Agree about accepting a Mapping too - I just hadn't got round to adding it yet.

do you imagine that sgkit will be able to serialiaze/deserialize data to/from Zarr in a way that will be agnostic to the file format from which the data originated, or will we always need to put a plink, bgen, or vcf prefix on these methods?

I think it will be agnostic, since you can just use Xarray's Dataset.to_zarr method. It would be a good idea to add some metadata to indicate the data provenance, e.g. file it was loaded from, timestamp, etc.

Surely this is just sgkit itself? I think this is what @tomwhite was suggesting above, that we create a sgkit.util or something module.

Right - that's exactly what I was suggesting.

I'm starting to think the vcfzarr function should live in sgkit-vcf though.

I'm fine with that. I might iterate on this in this PR first though.

alimanfoo · 2020-07-15T13:02:57Z

I'm starting to think the vcfzarr function should live in sgkit-vcf though. What's the advantage of putting it in sgkit vs sgkit-vcf?

At some point in the future, I imagine we might like to develop new functions for reading VCF files and writing out to zarr that is natively structured in the sgkit convention. That new code when written would seem to naturally live in a separate sgkit-vcf repo, e.g., because it might have specific dependencies for reading VCF such as htslib.

This PR, however, is doing something a bit different, which is opening scikit-allel-flavoured zarr data. That data is structured in a way that is very close to the sgkit xarray convention, so it's really a thin shim to do the transformation. That data happened to originate from VCF, but it's somewhat an indirect relationship.

So I'd suggest that this PR and the open_vcfzarr() function goes into sgkit, at least for now. It's a relatively small piece of code, and we wouldn't seem to gain much from factoring out to a separate package.

alimanfoo · 2020-07-15T13:10:52Z

Relatedly, do you imagine that sgkit will be able to serialize/deserialize data to/from Zarr in a way that will be agnostic to the file format from which the data originated, or will we always need to put a plink, bgen, or vcf prefix on these methods?

I imagine that ultimately sgkit will be able to open zarr data in the same way, regardless of which format the data originated in. That will depend on having support for reading different formats and outputting all to a common sgkit zarr format, which I imagine will follow the xarray zarr conventions and so can then be opened with xarray.open_zarr().

For the moment, the scikit-allel-flavoured zarr is a bit of a one-off case, because the data is already in zarr but isn't quite in sgkit/xarray conventions and so needs a little adapter.

alimanfoo · 2020-07-15T13:19:18Z

I think it will be agnostic, since you can just use Xarray's Dataset.to_zarr method. It would be a good idea to add some metadata to indicate the data provenance, e.g. file it was loaded from, timestamp, etc.

Agree with this, although a nuance here worth discussing at some point is that the path to zarr may need to be different for different file formats for practical reasons to do with executing the transformation for large datasets.

I'm guessing there will be two different cases.

One case would be formats like plink bed/bim/fam where the data can be parsed efficiently and it is convenient and performant to go via xarray.to_zarr() when performing the conversion to zarr.

The other case might be formats like VCF where parsing is slow and where we might want more specialised functions that read directly from VCF and write out to zarr, not going via xarray.to_zarr().

But in both cases the output should be the same, i.e., zarr data that conforms to the xarray+sgkit zarr conventions.

alimanfoo · 2020-07-15T13:21:29Z

Small comment, do we want to name this "open_vcfzarr" rather than "read_vcfzarr", just to follow the naming convention in xarray for opening files (e.g., xarray.open_zarr)?

sgkit/io/vcfzarr_reader.py

tomwhite · 2020-07-21T11:36:17Z

I've updated this PR to use the encode_array utility, and to handle the case where variants_alt is 1D.

I've opened a new issue (#56) for supporting more flexible layout options, since they can build on this PR, which is to add a basic path to importing VCF data via scikit-allel's vcf_to_zarr function.

I couldn't see a way to generate these alternate layouts using vcf_to_zarr (but I may be missing something). I also wasn't able to access a Zarr zip file as a Zarr Mapping object (fsspec.get_mapper(f"zip::{path}") didn't work). But I think these improvements can be handled later, and will need to be to access cloud storage.

tomwhite · 2020-07-21T12:02:08Z

Small comment, do we want to name this "open_vcfzarr" rather than "read_vcfzarr", just to follow the naming convention in xarray for opening files (e.g., xarray.open_zarr)?

For writing files the convention is to_X, e.g. to_zarr. So we would have functions for open_plink/to_plink, open_bgen/to_bgen, open_vcfzarr/to_vcfzarr. What do others think?

tomwhite · 2020-07-23T08:14:55Z

Apart from the naming question (open_vcfzarr or read_vcfzarr) I think this is complete (there are follow ups in #56). I'd like to get @alimanfoo's view before merging.

alimanfoo · 2020-08-06T08:58:47Z

Hi @tomwhite, apologies for slow feedback on this.

Happy to see this merged and deal with points raised in follow up PRs, your call.

From my point of view, this function becomes valuable when we can load data from the Anopheles gambiae 1000 Genomes Project (Ag1000G), which is the WGS dataset that I work on and mostly drives the requirements I have for sgkit.

Ag1000G data are stored in google cloud storage, use consolidated metadata, and grouped by contig. I.e., I'd like to be able to do:

callset = sgkit.open_vcfzarr('gs://ag1000g-release/phase2.AR1/variation/main/zarr/all/ag1000g.phase2.ar1', consolidated=True)

In general, I recommend grouping by contig to scikit-allel users, and so I think this is the more common case.

So FWIW I would be inclined to add support for fsspec-style URLs and for consolidated metadata in this PR as they are small changes.

Also FWIW I would be inclined to add support for grouping by contig and make that the default.

One partially-related comment, currently you do some reading of data values to ascertain the dtype for the variant_alleles array and compute the variant_contig array. This is going to cause some lag for the user, particularly for large datasets, and would be great to avoid any reading of data or computation if at all possible within this function.

I think reading data could be avoided when determining the dtype for the variant_alleles array. I don't think you need to know the maximum length of the string values, you could just take the maximum length of the dtype in the input REF and ALT arrays. E.g., if REF is S1 and ALT is S1 then you can use S1 for variant_alleles. Similarly if REF was S3 and ALT was S5 you could use S5 for variant_alleles.

Also, reading variants/CHROM and computation of the variant_contig array is not necessary if the input data are grouped by contig. This is because the group name tells you which chromosome variants belong to, without having to read the variants/CHROM array.

Hope that's useful, happy to play this however you think best.

tomwhite · 2020-08-06T11:51:41Z

Thanks for the comments @alimanfoo!

From my point of view, this function becomes valuable when we can load data from the Anopheles gambiae 1000 Genomes Project (Ag1000G), which is the WGS dataset that I work on and mostly drives the requirements I have for sgkit.

I agree that is where we should be aiming to get to. My goal with this PR was a bit more modest: it was to validate the sgkit representation and ensure that there was a path from VCF to sgkit. I think those things have been achieved, so I'd like to keep this PR to that more restricted scope and address the other things in follow on PRs (https://github.com/pystatgen/sgkit/issues/56). In general, I favour smaller PRs where possible as it makes things easier to review and merge.

(BTW do you have any hints on how to generate these alternative layouts? I couldn't see how to do it with vcf_to_zarr.)

I think reading data could be avoided when determining the dtype for the variant_alleles array. I don't think you need to know the maximum length of the string values, you could just take the maximum length of the dtype in the input REF and ALT arrays. E.g., if REF is S1 and ALT is S1 then you can use S1 for variant_alleles. Similarly if REF was S3 and ALT was S5 you could use S5 for variant_alleles.

That would certainly be more efficient. I had a look at doing this but the dtype of REF and ALT for the Zarr files I generated with vcf_to_zarr was object, so we have to look through all the string values in that case.

alimanfoo · 2020-08-06T14:05:41Z

I agree that is where we should be aiming to get to. My goal with this PR was a bit more modest: it was to validate the sgkit representation and ensure that there was a path from VCF to sgkit. I think those things have been achieved, so I'd like to keep this PR to that more restricted scope and address the other things in follow on PRs (#56). In general, I favour smaller PRs where possible as it makes things easier to review and merge.

Sure, totally understand.

(BTW do you have any hints on how to generate these alternative layouts? I couldn't see how to do it with vcf_to_zarr.)

Do you mean grouping by chromosome? (If so, example here.)

Or the alternative ways to store sample variables?

I think reading data could be avoided when determining the dtype for the variant_alleles array. I don't think you need to know the maximum length of the string values, you could just take the maximum length of the dtype in the input REF and ALT arrays. E.g., if REF is S1 and ALT is S1 then you can use S1 for variant_alleles. Similarly if REF was S3 and ALT was S5 you could use S5 for variant_alleles.

That would certainly be more efficient. I had a look at doing this but the dtype of REF and ALT for the Zarr files I generated with vcf_to_zarr was object, so we have to look through all the string values in that case.

Ah yes, I forgot I sometimes use object dtype for strings with zarr. I would suggest to accept object dtype and leave it as-is. I.e., the variant_alleles array can be either S or object dtype.

eric-czech · 2020-08-07T11:17:59Z

Ah yes, I forgot I sometimes use object dtype for strings with zarr. I would suggest to accept object dtype and leave it as-is. I.e., the variant_alleles array can be either S or object dtype.

+1 to that, I pulled it out in https://github.com/pystatgen/sgkit/issues/98 since I think we should consider the same for plink/bgen.

tomwhite · 2020-08-10T10:05:46Z

I would suggest to accept object dtype and leave it as-is. I.e., the variant_alleles array can be either S or object dtype.

I've fixed this now.

This is ready to merge.

jeromekelleher

LGTM

remove ts_to_dataset from public api make divergence take in two datasets add minimal fst Add read_vcfzarr (sgkit-dev#40) add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api make divergence take in two datasets add minimal fst add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add minimal fst

remove ts_to_dataset from public api make divergence take in two datasets add minimal fst Add read_vcfzarr (sgkit-dev#40) add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api make divergence take in two datasets add minimal fst add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add minimal fst add tajimas d fix allele count update cfg remove spaces add msprime and use np.testing

remove ts_to_dataset from public api make divergence take in two datasets add minimal fst Add read_vcfzarr (sgkit-dev#40) add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api make divergence take in two datasets add minimal fst add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add minimal fst add tajimas d fix allele count update cfg remove spaces add msprime and use np.testing add libgsl-dev dependency add docstrings

remove ts_to_dataset from public api make divergence take in two datasets add minimal fst Add read_vcfzarr (sgkit-dev#40) add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api make divergence take in two datasets add minimal fst add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add minimal fst add tajimas d fix allele count update cfg remove spaces add msprime and use np.testing add libgsl-dev dependency add docstrings ignore dep warning add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api make divergence take in two datasets add minimal fst Add read_vcfzarr (sgkit-dev#40) add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api make divergence take in two datasets add minimal fst add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add minimal fst add tajimas d fix allele count update cfg remove spaces add msprime and use np.testing add libgsl-dev dependency add docstrings fix divide by zero

remove ts_to_dataset from public api make divergence take in two datasets add minimal fst Add read_vcfzarr (#40) add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api make divergence take in two datasets add minimal fst add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add minimal fst add tajimas d fix allele count update cfg remove spaces add msprime and use np.testing add libgsl-dev dependency add docstrings ignore dep warning add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api make divergence take in two datasets add minimal fst Add read_vcfzarr (#40) add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api make divergence take in two datasets add minimal fst add tajimas d add ts_to_dataset add minimal diversity and divergence remove ts_to_dataset from public api add minimal fst add tajimas d fix allele count update cfg remove spaces add msprime and use np.testing add libgsl-dev dependency add docstrings fix divide by zero

alimanfoo reviewed Jul 14, 2020

View reviewed changes

eric-czech reviewed Jul 16, 2020

View reviewed changes

sgkit/io/vcfzarr_reader.py Outdated Show resolved Hide resolved

This was referenced Jul 20, 2020

Move encode_array to utils module from sgkit-plink #55

Merged

VCF Zarr improvements #56

Closed

tomwhite force-pushed the read_vcfzarr branch from e31d232 to f997fde Compare July 21, 2020 11:33

tomwhite force-pushed the read_vcfzarr branch from 63413b9 to 415b2a7 Compare July 27, 2020 08:33

tomwhite mentioned this pull request Jul 27, 2020

REGENIE Implementation #66

Merged

2 tasks

tomwhite mentioned this pull request Aug 3, 2020

Monorepo vs. multi-repo #65

Closed

tomwhite force-pushed the read_vcfzarr branch from 415b2a7 to d0bb0a3 Compare August 3, 2020 10:42

hammer mentioned this pull request Aug 5, 2020

[WIP] Docs describing the Genotype Call XArray #78

Closed

tomwhite force-pushed the read_vcfzarr branch from d0bb0a3 to 3b1ea7a Compare August 6, 2020 11:51

eric-czech mentioned this pull request Aug 7, 2020

Support fixed and dynamic length allele strings in IO readers #98

Closed

Add read_vcfzarr

3f2d0df

tomwhite added 5 commits August 10, 2020 10:36

Allow variant/ALT to be 1D

529d04d

Remove unneeded mypy setting

b526f5d

Add zarr dependency

2a91fbe

Fixes following https://github.com/pystatgen/sgkit/issues/81

8f494ee

Accept object dtype for variant_allele (don't change to string dtype)

bae0a22

tomwhite force-pushed the read_vcfzarr branch from 3b1ea7a to bae0a22 Compare August 10, 2020 10:02

jeromekelleher approved these changes Aug 10, 2020

View reviewed changes

tomwhite merged commit a7ad4ae into sgkit-dev:master Aug 10, 2020

hammer mentioned this pull request Aug 12, 2020

Read data from VCF files #104

Closed

eric-czech mentioned this pull request Aug 12, 2020

Add zarr to setup.cfg #105

Closed

daletovar pushed a commit to daletovar/sgkit that referenced this pull request Aug 21, 2020

Add read_vcfzarr (sgkit-dev#40)

9fbec78

daletovar pushed a commit to daletovar/sgkit that referenced this pull request Aug 25, 2020

Add read_vcfzarr (sgkit-dev#40)

b993398

daletovar pushed a commit to daletovar/sgkit that referenced this pull request Aug 27, 2020

Add read_vcfzarr (sgkit-dev#40)

c451483

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read_vcfzarr #40

Add read_vcfzarr #40

tomwhite commented Jul 14, 2020

alimanfoo commented Jul 14, 2020 •

edited

Loading

alimanfoo commented Jul 14, 2020

alimanfoo Jul 14, 2020

hammer commented Jul 15, 2020 •

edited

Loading

hammer commented Jul 15, 2020

jeromekelleher commented Jul 15, 2020

tomwhite commented Jul 15, 2020

alimanfoo commented Jul 15, 2020

alimanfoo commented Jul 15, 2020

alimanfoo commented Jul 15, 2020

alimanfoo commented Jul 15, 2020

tomwhite commented Jul 21, 2020

tomwhite commented Jul 21, 2020

tomwhite commented Jul 23, 2020

alimanfoo commented Aug 6, 2020 •

edited

Loading

tomwhite commented Aug 6, 2020

alimanfoo commented Aug 6, 2020

eric-czech commented Aug 7, 2020

tomwhite commented Aug 10, 2020

jeromekelleher left a comment

Add read_vcfzarr #40

Add read_vcfzarr #40

Conversation

tomwhite commented Jul 14, 2020

alimanfoo commented Jul 14, 2020 • edited Loading

alimanfoo commented Jul 14, 2020

alimanfoo Jul 14, 2020

Choose a reason for hiding this comment

hammer commented Jul 15, 2020 • edited Loading

hammer commented Jul 15, 2020

jeromekelleher commented Jul 15, 2020

tomwhite commented Jul 15, 2020

alimanfoo commented Jul 15, 2020

alimanfoo commented Jul 15, 2020

alimanfoo commented Jul 15, 2020

alimanfoo commented Jul 15, 2020

tomwhite commented Jul 21, 2020

tomwhite commented Jul 21, 2020

tomwhite commented Jul 23, 2020

alimanfoo commented Aug 6, 2020 • edited Loading

tomwhite commented Aug 6, 2020

alimanfoo commented Aug 6, 2020

eric-czech commented Aug 7, 2020

tomwhite commented Aug 10, 2020

jeromekelleher left a comment

Choose a reason for hiding this comment

alimanfoo commented Jul 14, 2020 •

edited

Loading

hammer commented Jul 15, 2020 •

edited

Loading

alimanfoo commented Aug 6, 2020 •

edited

Loading