Skip to content

VCF compression size test #941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

tomwhite
Copy link
Collaborator

This uses data from #925 in a test that asserts that the Zarr size is less than 75% of the gzipped VCF. It could be made into a benchmark (asv or otherwise, see #938), but for the moment it was simplest to write it as a regular unit test.

When writing this I found a bug/limitation (f7d44a8) where the Zarr chunk size in the samples dimension is set to 1000 (the default), and not limited to the number of samples. This would be potentially wasteful and slow for VCFs with just a few samples. @benjeffery I wonder if this change helps speed up your TB VCF import?

@tomwhite tomwhite marked this pull request as ready for review October 25, 2022 10:15
@jeromekelleher
Copy link
Collaborator

Hmm, 17Mb is very big to add to the repo. Presumably this unit test takes a perceivable amount of time also - I wonder if this is the right format format for this benchmark?

@tomwhite
Copy link
Collaborator Author

Hmm, 17Mb is very big to add to the repo. Presumably this unit test takes a perceivable amount of time also - I wonder if this is the right format format for this benchmark?

You're right - maybe let's not merge this then, but use it as the basis for a benchmark when we've decided how to run them. (BTW we already have a 13MB test file in the repo, but good not to add too many of these.)

I'll move the chunk fix into its own PR.

@tomwhite tomwhite marked this pull request as draft November 1, 2022 09:45
@tomwhite tomwhite force-pushed the vcf-compression-size-test branch from 5c3ff56 to 2aa5767 Compare November 1, 2022 09:46
@tomwhite
Copy link
Collaborator Author

Closed in favour of #978

@tomwhite tomwhite closed this Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants