Add variant annotation functions #112

eric-czech · 2020-08-13T13:35:33Z

In quantitative genetics it is common not to treat alleles at a locus as equal. The "functional consequence" of each allele is important and the process for determining these consequences is well standardized in VEP (in coding regions at least).

Providing access to annotations like this, ideally using the LOFTEE plugin, would be very useful since it is a common task and not necessarily an easy one. Hail's vep and nirvana functions could be a good guide.

hammer · 2020-09-03T14:53:54Z

Given the approach in #227 to not vendor a PCA implementation, do you think we might want to just document how to use an external library to annotate variants, or do you think we will need some code inside sgkit to make variant annotation work nicely with our data structures?

eric-czech · 2020-09-03T17:36:00Z

🤔 The hail solution is to:

Have you manage the VEP (or Nirvana) install
Split up the variant metadata for a dataset and run the vep CLI on vcfs
Recombine the results

Assuming cloud storage, I want to say it would actually be easier (from a user's perspective) for us to create a docker image with vep installed and then have something like a snakemake pipeline on GKE read exported variant data from Xarray/sgkit and produce results you can read back in easily.

Distributing variant data and running VEP on it isn't the hard part IMO, it's managing the installation on a cluster that will be a pain for users. I'm not sure how to make that go away without docker, so there is perhaps some advantage to us having an sgkit docker image that descends from the dask image used by Helm with this extra stuff installed. That would certainly make it easier to avoid needing an external pipeline tool.

I would classify it a little differently than PCA though since the external library is so much harder to apply in this case.

jeromekelleher · 2020-09-03T18:47:47Z

This feels to me like it's outside our remit - integrating with the Pydata ecosystem. If we start front-ending VEP for users, where do we stop? Certainly we should support processing VEP annotations but I think running VEP should be outside our scope.

eric-czech · 2020-09-04T13:47:03Z

This feels to me like it's outside our remit - integrating with the Pydata ecosystem. If we start front-ending VEP for users, where do we stop?

Good point. I can see there being some satellite pystatgen repos that are specific to putting some kind of compatible front end on hard-to-scale CLI tools.

eric-czech mentioned this issue Aug 13, 2020

Requirements for UKB GWAS #67

Open

11 tasks

hammer added the core operations Issues related to domain-specific functionality such as LD pruning, PCA, association testing, etc. label Aug 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add variant annotation functions #112

Add variant annotation functions #112

eric-czech commented Aug 13, 2020

hammer commented Sep 3, 2020

Uh oh!

eric-czech commented Sep 3, 2020

Uh oh!

jeromekelleher commented Sep 3, 2020

Uh oh!

eric-czech commented Sep 4, 2020

Uh oh!

Add variant annotation functions #112

Add variant annotation functions #112

Comments

eric-czech commented Aug 13, 2020

hammer commented Sep 3, 2020

Uh oh!

eric-czech commented Sep 3, 2020

Uh oh!

jeromekelleher commented Sep 3, 2020

Uh oh!

eric-czech commented Sep 4, 2020

Uh oh!