Skip to content

PR 102: Summary stats #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
quansight-bot opened this issue Aug 27, 2020 · 0 comments
Closed

PR 102: Summary stats #144

quansight-bot opened this issue Aug 27, 2020 · 0 comments

Comments

@quansight-bot
Copy link

quansight-bot commented Aug 27, 2020

Issue metadata

Issue description

Issue metadata

Issue description

Issue metadata

Issue description

Issue metadata

Issue description

Issue metadata

Issue description

Pull request metadata

Pull request description

This is a draft implementation of #29 (based on [[[[[@eric-czech](https://github.com/eric-czech)](https://github.com/eric-czech)](https://github.com/eric-czech)](https://github.com/eric-czech)](https://github.com/eric-czech)'s code there) for discussion. I'm quite happy for someone else to take if over if interested (e.g. [[[[[@daletovar](https://github.com/daletovar)](https://github.com/daletovar)](https://github.com/daletovar)](https://github.com/daletovar)](https://github.com/daletovar), [[[[[@jerowe](https://github.com/jerowe)](https://github.com/jerowe)](https://github.com/jerowe)](https://github.com/jerowe)](https://github.com/jerowe)).

API docs need to be added still. Also sample summary stats.

A couple of points for discussion:

  1. Method naming. We currently have count_alleles (verb), but some of the new methods don't easily fit into that pattern: e.g. call_rate, variant_stats. Perhaps insisting the function name for each method is a verb is going too far, and we should just allow the name that we think sounds best (this seems to be the approach taken by Hail and Glow which both have a mixture of both styles).
  2. Conditionally computing variables, and merging. The allele_frequency function calls count_alleles, but perhaps it should check first if it has already been computed (I think [[[[[@eric-czech](https://github.com/eric-czech)](https://github.com/eric-czech)](https://github.com/eric-czech)](https://github.com/eric-czech)](https://github.com/eric-czech) does something similar in LD prune, for example). In that case, should it return the variant_allele_count variable in the returned dataset? I'm wondering whether we should actually merge the new variables with the original dataset as a general rule. The original dataset would be unchanged, and this would mean the user no longer had to call merge themselves, which might work better in pipelines, for example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant