Use of int8 for call_genotype results in integer overflow with complex variants #640

timothymillar · 2021-07-26T02:47:13Z

I'm working with some microhaplotype variant calls in which the number of the alleles can be >300 (likely due to poor quality calls at some loci). It would be ideal if the call_genotype dtype was configurable and/or automatically set based on the max_alt_alleles parameter. Similar discussion in #584.

The text was updated successfully, but these errors were encountered:

jeromekelleher · 2021-07-26T09:39:33Z

+1 - this is definitely an issue. Setting based on max_alt_alleles is good, and I think that seals the deal on mapping any alleles we can't represent to missing data.

tomwhite · 2021-07-26T15:30:09Z

It would be ideal if the call_genotype dtype was configurable and/or automatically set based on the max_alt_alleles parameter.

+1

BTW I just opened #643, which is tangentially related. @timothymillar I wonder if you see a few very long alleles in your data?

timothymillar · 2021-07-26T21:19:46Z

I wonder if you see a few very long alleles in your data

I'm currently working with alleles with lengths 120bp. These are fixed sized "chunks" across the genome (targeted sequencing) so fixed length strings are suitable. But I think we'll use more variable allele lengths in future as these chunks need some tuning.
We also use freebayes quite regularly which can produce highly variable allele lengths. However, most of the data I'm working with is targeted sequencing so the datasets are quite small in the variants dimension.

tomwhite · 2021-07-27T08:22:38Z

Good to know - thanks.

jeromekelleher mentioned this issue Jul 26, 2021

Issue a warning if the number of alt alleles exceeds the maximum specified #620

Merged

tomwhite added the data representation Issues related to how data is represented: data types, data structures, indexes, access methods, etc label Jul 26, 2021

timothymillar added a commit to timothymillar/sgkit that referenced this issue Sep 26, 2021

Set call_genotype dtype based on max_alt_alleles sgkit-dev#640

08e20ba

timothymillar mentioned this issue Sep 26, 2021

Set call_genotype dtype based on max_alt_alleles #640 #686

Merged

mergify bot closed this as completed in #686 Oct 1, 2021

mergify bot pushed a commit that referenced this issue Oct 1, 2021

Set call_genotype dtype based on max_alt_alleles #640

d3b77ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use of int8 for call_genotype results in integer overflow with complex variants #640

Use of int8 for call_genotype results in integer overflow with complex variants #640

timothymillar commented Jul 26, 2021

jeromekelleher commented Jul 26, 2021

Uh oh!

tomwhite commented Jul 26, 2021

Uh oh!

timothymillar commented Jul 26, 2021

Uh oh!

tomwhite commented Jul 27, 2021

Uh oh!

Use of int8 for call_genotype results in integer overflow with complex variants #640

Use of int8 for call_genotype results in integer overflow with complex variants #640

Comments

timothymillar commented Jul 26, 2021

jeromekelleher commented Jul 26, 2021

Uh oh!

tomwhite commented Jul 26, 2021

Uh oh!

timothymillar commented Jul 26, 2021

Uh oh!

tomwhite commented Jul 27, 2021

Uh oh!