-
Notifications
You must be signed in to change notification settings - Fork 35
Requirements for UKB GWAS #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'd be happy to work on variant/sample stats (#29) if no one else is working on them. |
@eric-czech how are you thinking about LD estimation/pruning, population structure estimation/pruning, and relatedness estimation/pruning? Does REGENIE include in the implementation some means of estimating these things as covariates for the regression, or are you just thinking of those operations as optimizations that can be implemented later? |
To answer my own question, there are 3 stages to our work with UK Biobank
|
I've been thinking about this one too. I think we're going to feel Dask's poor handling of nested data when working with phenotypes, and I'd prefer to keep Spark out of this project as a dependency, so I think we put that code into a separate repo if we find we do need Spark. |
File an issue to track? |
|
@eric-czech I just had a nice chat with @zietzm and @ntatonetti who are at Columbia and are experts in handling complex phenotypes and running many GWAS against them. They're interested in using Would you be open to making https://github.com/related-sciences/ukb-gwas-pipeline-nealelab public soon and potentially working with @zietzm to factor the phenotype handling code into its own repo, maybe something like |
For sure! Looking forward to seeing how we can better integrate phenotypes. |
FYI @zietzm / @ntatonetti (cc: @hammer) the phenotype prep code we're currently using (via PHESANT) is here: ukb-gwas-pipeline-nealelab#phenotype_prep.smk. There is little to it yet other than running some messy, very inefficient R code to produce ~75 phenotypes that I wanted to attempt to validate against first. It would be great to hear your thoughts on how we might better define these as well as improve the mechanics of how we're creating them. I'm particularly interested in ICD code management since this pipeline doesn't address that. |
To run a basic GWAS on UKB data, here are some of the operations we'll need support for:
is_autosome
function to filter variants byThere may be a few more beyond that, but I think anything remaining should be reasonable with Xarray/Dask alone.
The text was updated successfully, but these errors were encountered: