-
Notifications
You must be signed in to change notification settings - Fork 35
Add Hudson estimator to Fst, and make it the default #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Compute Fst from single divergence matrix which has diversity values on the diagonal. Fixes sgkit-dev#292
@eric-czech any chance you can take a look at this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing @tomwhite, LGTM
sgkit/tests/test_popgen.py
Outdated
np.testing.assert_allclose(div, ts_div) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"size, n_cohorts", | ||
[(2, 2), (3, 2), (10, 2), (100, 2)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the set of of size / n_cohort pairs so different for the Hudson vs Nei tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @eric-czech!
Hudson is tested by comparing it to scikit-allel, which only allows pairs of cohorts (populations), whereas Nei is compared to tskit, which allows any number of cohorts (and considers them in pairs). I've added a note to the test saying this.
sgkit/tests/test_popgen.py
Outdated
"size, n_cohorts", | ||
[(2, 2), (3, 2), (10, 2), (100, 2)], | ||
) | ||
def test_Fst__Hudson(size, n_cohorts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth making chunking a part of the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. Added a chunks parameter to all popgen tests. This exposed an issue in the divergence code, which I've now fixed.
Codecov Report
@@ Coverage Diff @@
## master #302 +/- ##
==========================================
- Coverage 97.61% 96.35% -1.26%
==========================================
Files 26 26
Lines 1843 1866 +23
==========================================
- Hits 1799 1798 -1
- Misses 44 68 +24
Continue to review full report at Codecov.
|
Also, compute Fst from single divergence matrix which has diversity values on the diagonal. This idea is from tskit, which does the same thing. The advantage of this approach will come with windowing, since only one array (divergence) will need to be windowed, rather than two (diversity and divergence).
Fixes #292