-
Notifications
You must be signed in to change notification settings - Fork 35
Fst function #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
ps. Also I find having a capital letter in the function name ugly, but I'm not sure what the alternative is for these popgen stats. |
This should be done once the basic popgen stats are implemented (#100). |
I'd like to add the estimator described by Harris and DeGiorgio (2016) to the list at some point. Their estimator is based on Hudson's but attempts to correct for related and inbred individuals using kinship. |
Just using lowercase would be an option (scikit-allel does). |
I noticed that the implementation of Fst we have at the moment (based on tskit) is defined as
whereas scikit-allel's
Does tskit's Fst correspond to one of the other implementations from scikit-allel, or is it a separate one entirely? Should we change Fst in sgkit to return results that are the same as scikit-allel's |
Good question @tomwhite! I've opened an issue on tskit (tskit-dev/tskit#858) to clarify this. I'm guessing that we should be using Hudson's estimator in sgkit by default (see above for suggested notation for specifying different estimators), and so we should change to using scikit-allel's definition (and test against |
Thanks @jeromekelleher. I've opened #292 to implement this. What name do you think we should use for the tskit estimator? The linked discussion had a lot of names! |
Sounds like Nei1986 is the right name for it, but we'll need to confirm that. Maybe we could have a comment in the code linking to the tskit issue? |
Fst is one of the fundamental building blocks of population structure analysis. We will want to compute Fst between pairs of populations, with the "population" labels designated by some properties of the input dataset (see #224).
There are a number of different estimators for Fst (scikit-allel implements 3), so we should provide a method to specify the estimator for the statistic as a parameter. I suggest something like the following:
These correspond to the three definitions in scikit-allele. We may not want all three initially, and just implementing the Hudson estimator may be sufficient. We can test our implementations by comparing with scikit-allele and tskit
(ps. I prefer to use
None
as the default value forestimator
, as there may be situations in the future where we might prefer to have a different default depending on properties of the dataset. If we leaveestimator="hudson"
in the signature, then there's no way to tell if the user just wants the default or has specifically asked for "hudson". In general, unless we're totally sure that the default is never going to change, I think it's better to useNone
as the default value in the signature.)The text was updated successfully, but these errors were encountered: