Skip to content

Cohort subsets #394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 24, 2020
Merged

Cohort subsets #394

merged 7 commits into from
Nov 24, 2020

Conversation

tomwhite
Copy link
Collaborator

This addresses https://github.com/pystatgen/sgkit/issues/224#issuecomment-694350208 for specifying cohort subsets (for Garud H and PBS).

This is a draft since I haven't run it on MalariaGEN scale data yet, but I'd welcome any initial feedback.

@tomwhite
Copy link
Collaborator Author

I've now run this successfully on the MalariaGEN data:

H stats: https://nbviewer.jupyter.org/github/tomwhite/shiny-train/blob/sgkit/notebooks/gwss/sgkit_h12.ipynb
PBS: https://nbviewer.jupyter.org/github/tomwhite/shiny-train/blob/sgkit/notebooks/gwss/sgkit_pbs.ipynb

Example usage:

sg.Garud_h(ds, cohorts=["ao_col"]) # list of cohort IDs
sg.pbs(ds, cohorts=[("ao_col", "ga_gam", "gw")]) # list of cohort ID triples

@tomwhite tomwhite marked this pull request as ready for review November 23, 2020 13:02
Copy link
Collaborator

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is powerful stuff, thanks @tomwhite! A minor comment above, merge away as you see fit.

sgkit/cohorts.py Outdated
Returns
-------
An array of shape ``(len(cohorts), tuple_len)``, where ``tuple_len`` is the length
of the tuples, or 1 if ``cohorts`` is a sequence of values.,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing ,

tuples to an array of ints used to match samples in ``sample_cohorts``.

Cohorts can be specified by index (as used in ``sample_cohorts``), or a label, in
which case an ``index`` must be provided to find index locations for cohorts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance of a couple of simple examples here and the return values? I'm finding it a bit abstract and a concrete example would help understand what the function does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @jeromekelleher. Added two examples in pystatgen/sgkit@f4b45ea

@tomwhite tomwhite added the auto-merge Auto merge label for mergify test flight label Nov 24, 2020
@codecov-io
Copy link

codecov-io commented Nov 24, 2020

Codecov Report

Merging #394 (189c964) into master (3b13a7b) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #394   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           32        33    +1     
  Lines         2246      2276   +30     
=========================================
+ Hits          2246      2276   +30     
Impacted Files Coverage Δ
sgkit/cohorts.py 100.00% <100.00%> (ø)
sgkit/stats/popgen.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b13a7b...189c964. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Auto merge label for mergify test flight
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants