Cohort subsets #394

tomwhite · 2020-11-19T15:24:52Z

This addresses https://github.com/pystatgen/sgkit/issues/224#issuecomment-694350208 for specifying cohort subsets (for Garud H and PBS).

This is a draft since I haven't run it on MalariaGEN scale data yet, but I'd welcome any initial feedback.

tomwhite · 2020-11-23T13:02:39Z

I've now run this successfully on the MalariaGEN data:

H stats: https://nbviewer.jupyter.org/github/tomwhite/shiny-train/blob/sgkit/notebooks/gwss/sgkit_h12.ipynb
PBS: https://nbviewer.jupyter.org/github/tomwhite/shiny-train/blob/sgkit/notebooks/gwss/sgkit_pbs.ipynb

Example usage:

sg.Garud_h(ds, cohorts=["ao_col"]) # list of cohort IDs

sg.pbs(ds, cohorts=[("ao_col", "ga_gam", "gw")]) # list of cohort ID triples

jeromekelleher

This is powerful stuff, thanks @tomwhite! A minor comment above, merge away as you see fit.

jeromekelleher · 2020-11-24T11:58:06Z

sgkit/cohorts.py

+    Returns
+    -------
+    An array of shape ``(len(cohorts), tuple_len)``, where ``tuple_len`` is the length
+    of the tuples, or 1 if ``cohorts`` is a sequence of values.,


jeromekelleher · 2020-11-24T12:00:35Z

sgkit/cohorts.py

+    tuples to an array of ints used to match samples in ``sample_cohorts``.
+
+    Cohorts can be specified by index (as used in ``sample_cohorts``), or a label, in
+    which case an ``index`` must be provided to find index locations for cohorts.


Any chance of a couple of simple examples here and the return values? I'm finding it a bit abstract and a concrete example would help understand what the function does.

Thanks for the review @jeromekelleher. Added two examples in pystatgen/sgkit@f4b45ea

codecov-io · 2020-11-24T14:56:31Z

Codecov Report

Merging #394 (189c964) into master (3b13a7b) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #394   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           32        33    +1     
  Lines         2246      2276   +30     
=========================================
+ Hits          2246      2276   +30

Impacted Files	Coverage Δ
sgkit/cohorts.py	`100.00% <100.00%> (ø)`
sgkit/stats/popgen.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b13a7b...189c964. Read the comment docs.

tomwhite added 5 commits November 20, 2020 11:52

Factor out a test method to create cohorts

92cbb3c

Generalise Fst and PBS tests to test variable numbers of cohorts

8078bf8

Cohort utilities

55e8e89

Cohort subsets for Garud H

74a9229

Cohort subsets for PBS

a97b043

tomwhite force-pushed the cohort-subsets branch from 4afaaa6 to a97b043 Compare November 23, 2020 13:01

tomwhite marked this pull request as ready for review November 23, 2020 13:02

jeromekelleher approved these changes Nov 24, 2020

View reviewed changes

tomwhite added the auto-merge Auto merge label for mergify test flight label Nov 24, 2020

Add example to doc for _cohorts_to_array

2332fe7

tomwhite force-pushed the cohort-subsets branch from f4b45ea to 2332fe7 Compare November 24, 2020 14:48

Merge branch 'master' into cohort-subsets

189c964

mergify bot merged commit d476da4 into sgkit-dev:master Nov 24, 2020

tomwhite mentioned this pull request Nov 30, 2020

Method for grouping samples which is understood by library functions #224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cohort subsets #394

Cohort subsets #394

Uh oh!

tomwhite commented Nov 19, 2020

Uh oh!

tomwhite commented Nov 23, 2020

Uh oh!

jeromekelleher left a comment

Uh oh!

jeromekelleher Nov 24, 2020

Uh oh!

jeromekelleher Nov 24, 2020

Uh oh!

tomwhite Nov 24, 2020

Uh oh!

codecov-io commented Nov 24, 2020 •

edited

Loading

Uh oh!

Uh oh!

Cohort subsets #394

Cohort subsets #394

Uh oh!

Conversation

tomwhite commented Nov 19, 2020

Uh oh!

tomwhite commented Nov 23, 2020

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

tomwhite Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Nov 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

codecov-io commented Nov 24, 2020 •

edited

Loading