Add Hudson estimator to Fst, and make it the default #302

tomwhite · 2020-10-05T10:33:10Z

Also, compute Fst from single divergence matrix which has diversity values on the diagonal. This idea is from tskit, which does the same thing. The advantage of this approach will come with windowing, since only one array (divergence) will need to be windowed, rather than two (diversity and divergence).

Fixes #292

Compute Fst from single divergence matrix which has diversity values on the diagonal. Fixes sgkit-dev#292

tomwhite · 2020-10-12T09:20:05Z

@eric-czech any chance you can take a look at this one?

eric-czech

sure thing @tomwhite, LGTM

eric-czech · 2020-10-12T13:25:50Z

sgkit/tests/test_popgen.py

    np.testing.assert_allclose(div, ts_div)


+@pytest.mark.parametrize(
+    "size, n_cohorts",
+    [(2, 2), (3, 2), (10, 2), (100, 2)],


Why is the set of of size / n_cohort pairs so different for the Hudson vs Nei tests?

Thanks for taking a look @eric-czech!

Hudson is tested by comparing it to scikit-allel, which only allows pairs of cohorts (populations), whereas Nei is compared to tskit, which allows any number of cohorts (and considers them in pairs). I've added a note to the test saying this.

pystatgen/sgkit@7cc6493

eric-czech · 2020-10-12T13:26:05Z

sgkit/tests/test_popgen.py

+    "size, n_cohorts",
+    [(2, 2), (3, 2), (10, 2), (100, 2)],
+)
+def test_Fst__Hudson(size, n_cohorts):


Worth making chunking a part of the tests?

Yes, definitely. Added a chunks parameter to all popgen tests. This exposed an issue in the divergence code, which I've now fixed.

pystatgen/sgkit@c9338ff

…orts

codecov-io · 2020-10-13T13:22:10Z

Codecov Report

Merging #302 into master will decrease coverage by 1.25%.
The diff coverage is 39.02%.

@@            Coverage Diff             @@
##           master     #302      +/-   ##
==========================================
- Coverage   97.61%   96.35%   -1.26%     
==========================================
  Files          26       26              
  Lines        1843     1866      +23     
==========================================
- Hits         1799     1798       -1     
- Misses         44       68      +24

Impacted Files	Coverage Δ
sgkit/stats/popgen.py	`65.60% <39.02%> (-15.78%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 045acad...ee34283. Read the comment docs.

Add hudson estimator to Fst, and make it the default.

8053bdc

Compute Fst from single divergence matrix which has diversity values on the diagonal. Fixes sgkit-dev#292

tomwhite requested a review from jeromekelleher October 5, 2020 10:33

tomwhite mentioned this pull request Oct 5, 2020

Fst windowing #303

Merged

Don't hardcode variable name

626785b

eric-czech approved these changes Oct 12, 2020

View reviewed changes

eric-czech reviewed Oct 12, 2020

View reviewed changes

tomwhite added 2 commits October 12, 2020 16:39

Be explicit that scikit-allel can only calculate Fst for pairs of coh…

7cc6493

…orts

Test popgen functions on data chunked in variants dimension.

c9338ff

tomwhite added the auto-merge Auto merge label for mergify test flight label Oct 13, 2020

Merge branch 'master' into fst-hudson

ee34283

mergify bot merged commit b83ca1b into sgkit-dev:master Oct 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Hudson estimator to Fst, and make it the default #302

Add Hudson estimator to Fst, and make it the default #302

tomwhite commented Oct 5, 2020

tomwhite commented Oct 12, 2020

eric-czech left a comment

eric-czech Oct 12, 2020

tomwhite Oct 13, 2020

eric-czech Oct 12, 2020

tomwhite Oct 13, 2020

codecov-io commented Oct 13, 2020

Add Hudson estimator to Fst, and make it the default #302

Add Hudson estimator to Fst, and make it the default #302

Conversation

tomwhite commented Oct 5, 2020

tomwhite commented Oct 12, 2020

eric-czech left a comment

Choose a reason for hiding this comment

eric-czech Oct 12, 2020

Choose a reason for hiding this comment

tomwhite Oct 13, 2020

Choose a reason for hiding this comment

eric-czech Oct 12, 2020

Choose a reason for hiding this comment

tomwhite Oct 13, 2020

Choose a reason for hiding this comment

codecov-io commented Oct 13, 2020

Codecov Report