File tree 1 file changed +25
-0
lines changed
1 file changed +25
-0
lines changed Original file line number Diff line number Diff line change @@ -213,6 +213,31 @@ def allele_frequency(ds: Dataset) -> Dataset:
213
213
214
214
215
215
def variant_stats (ds : Dataset , merge : bool = True ) -> Dataset :
216
+ """Compute quality control variant statistics from genotype calls.
217
+
218
+ Parameters
219
+ ----------
220
+ ds : Dataset
221
+ Genotype call dataset such as from
222
+ `sgkit.create_genotype_call_dataset`.
223
+ merge : bool, optional
224
+ If True (the default), merge the input dataset and the computed variables into
225
+ a single dataset, otherwise return only the computed variables.
226
+
227
+ Returns
228
+ -------
229
+ Dataset
230
+ A dataset containing the following variables:
231
+ - `variant_n_called` (variants): The number of samples with called genotypes.
232
+ - `variant_call_rate` (variants): The fraction of samples with called genotypes.
233
+ - `variant_n_het` (variants): The number of samples with heterozygous calls.
234
+ - `variant_n_hom_ref` (variants): The number of samples with homozygous reference calls.
235
+ - `variant_n_hom_alt` (variants): The number of samples with homozygous alternate calls.
236
+ - `variant_n_non_ref` (variants): The number of samples that are not homozygous reference calls.
237
+ - `variant_allele_count` (variants, alleles): The number of occurrences of each allele.
238
+ - `variant_allele_total` (variants): The number of occurrences of all alleles.
239
+ - `variant_allele_frequency` (variants, alleles): The frequency of occurence of each allele.
240
+ """
216
241
new_ds = xr .merge (
217
242
[
218
243
call_rate (ds , dim = "samples" ),
You can’t perform that action at this time.
0 commit comments