-
Notifications
You must be signed in to change notification settings - Fork 35
Cohort subsets #394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cohort subsets #394
Conversation
4afaaa6
to
a97b043
Compare
I've now run this successfully on the MalariaGEN data: H stats: https://nbviewer.jupyter.org/github/tomwhite/shiny-train/blob/sgkit/notebooks/gwss/sgkit_h12.ipynb Example usage:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is powerful stuff, thanks @tomwhite! A minor comment above, merge away as you see fit.
sgkit/cohorts.py
Outdated
Returns | ||
------- | ||
An array of shape ``(len(cohorts), tuple_len)``, where ``tuple_len`` is the length | ||
of the tuples, or 1 if ``cohorts`` is a sequence of values., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing ,
tuples to an array of ints used to match samples in ``sample_cohorts``. | ||
|
||
Cohorts can be specified by index (as used in ``sample_cohorts``), or a label, in | ||
which case an ``index`` must be provided to find index locations for cohorts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance of a couple of simple examples here and the return values? I'm finding it a bit abstract and a concrete example would help understand what the function does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @jeromekelleher. Added two examples in pystatgen/sgkit@f4b45ea
f4b45ea
to
2332fe7
Compare
Codecov Report
@@ Coverage Diff @@
## master #394 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 32 33 +1
Lines 2246 2276 +30
=========================================
+ Hits 2246 2276 +30
Continue to review full report at Codecov.
|
This addresses https://github.com/pystatgen/sgkit/issues/224#issuecomment-694350208 for specifying cohort subsets (for Garud H and PBS).
This is a draft since I haven't run it on MalariaGEN scale data yet, but I'd welcome any initial feedback.