Skip to content

Commit 0f17c53

Browse files
committed
Doc
1 parent 88c08b9 commit 0f17c53

File tree

1 file changed

+25
-0
lines changed

1 file changed

+25
-0
lines changed

sgkit/stats/aggregation.py

+25
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,31 @@ def allele_frequency(ds: Dataset) -> Dataset:
213213

214214

215215
def variant_stats(ds: Dataset, merge: bool = True) -> Dataset:
216+
"""Compute quality control variant statistics from genotype calls.
217+
218+
Parameters
219+
----------
220+
ds : Dataset
221+
Genotype call dataset such as from
222+
`sgkit.create_genotype_call_dataset`.
223+
merge : bool, optional
224+
If True (the default), merge the input dataset and the computed variables into
225+
a single dataset, otherwise return only the computed variables.
226+
227+
Returns
228+
-------
229+
Dataset
230+
A dataset containing the following variables:
231+
- `variant_n_called` (variants): The number of samples with called genotypes.
232+
- `variant_call_rate` (variants): The fraction of samples with called genotypes.
233+
- `variant_n_het` (variants): The number of samples with heterozygous calls.
234+
- `variant_n_hom_ref` (variants): The number of samples with homozygous reference calls.
235+
- `variant_n_hom_alt` (variants): The number of samples with homozygous alternate calls.
236+
- `variant_n_non_ref` (variants): The number of samples that are not homozygous reference calls.
237+
- `variant_allele_count` (variants, alleles): The number of occurrences of each allele.
238+
- `variant_allele_total` (variants): The number of occurrences of all alleles.
239+
- `variant_allele_frequency` (variants, alleles): The frequency of occurence of each allele.
240+
"""
216241
new_ds = xr.merge(
217242
[
218243
call_rate(ds, dim="samples"),

0 commit comments

Comments
 (0)