Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haplocheck and haplogrep issues #14

Open
squigzzz opened this issue Dec 6, 2023 · 3 comments
Open

haplocheck and haplogrep issues #14

squigzzz opened this issue Dec 6, 2023 · 3 comments

Comments

@squigzzz
Copy link

squigzzz commented Dec 6, 2023

there was a pretty significant change made to the code in the past 2 months which got haplocheck working properly, not sure what that was but thank you for that.

I wanted to drop some notes here based on my research, according to the haplocheck documentation https://mitoverse.readthedocs.io/haplocheck/haplocheck/, it requires vcf genotypes to be encoded as 0/1 for heteroplasmic sites and 1/1 for homoplasmic sites however, the way mutect2 is currently being implemented, its not using mitochondria mode which means every variant site is being encoded as homozygous or heterozygous as it is assuming these sequences are from the nuclear genome and are diploid. When you ingest this mutect2 generated vcf into haplocheck nearly every sample will come back as contaminated since every site is essentially heteroplasmic. In addition, within the mitoHPC publication itself within your figure 2 flowchart you mention how haplocheck should be used only with the rCRS reference however in the code implementation if you try to run the pipeline using rCRS instead of chrM it crashes with a samtools header error. As such I believe the mutect2 output should never be used with haplocheck, atleast not in its current implementation. Would appreciate any feedback, comments. Thanks.

More practically, I didnt observe any major differences when using the chrM reference as opposed to rCRS for calling variants using mutserve.

@squigzzz
Copy link
Author

squigzzz commented Dec 8, 2023

also as a follow up to this, I am curious as to why mitochondria mode is not being used with the mutect2 implementation you have currently embedded. There are several advantages to this pipeline, but perhaps the entire logic of a sample specific reference needs to be rebuilt.

@dpuiu
Copy link
Owner

dpuiu commented Dec 8, 2023

Hi squigzzz and thank you so much for your feedback!

Sure, we will try to update the pipeline, set "GT=1/1" for the homoplasmic calls and further check the haplocheck functionality.

Tests on simulated data performed better when using plain "Mutect2" than "Mutect2 --mitochondria-mode" . The latter, would miss some of the simulated SNVs, it is why we did not use it.

However, the "mitochondria-mode" can easily be enabled by setting
export HP_GOPT="--mitochondria-mode"
in the "init.sh" file

@squigzzz
Copy link
Author

squigzzz commented Dec 8, 2023

Thank you for getting back to me so quickly, I greatly appreciate it. I am also curious whether in the current implementation therer is any filtering thats being done to the raw VCFs prior to ingesting them into haplocheck. Since some of the variant calls could be annotated as NUMTS or other sequencing artifacts. Would it be wise to perform some level of initial QC on the VCFs prior to ingesting them into haplocheck ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants