Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gnomAD v4.1 SV database contains mix of SV and CNV - svdb problem #615

Open
fa2k opened this issue Sep 20, 2024 · 0 comments
Open

gnomAD v4.1 SV database contains mix of SV and CNV - svdb problem #615

fa2k opened this issue Sep 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@fa2k
Copy link
Contributor

fa2k commented Sep 20, 2024

Description of the bug

gnomAD SV v4.1 (https://gnomad.broadinstitute.org/news/2023-11-v4-structural-variants/) contains some CNVs that don't have AC or AF information in the vcf (gnomad.v4.1.sv.sites.vcf.gz).

svdb --query is refusing to annotate the vcf if the annotations for --in_occ or --in_frq are missing (in this case AC and AF) for some variants in the database, producing an output vcf without any variants.

It works if I remove lines without AC / AF from the gnomad file, but that means we remove this gnomAD information. Would it make sense to somehow integrate this CNV information for the annotation of SVs instead?

Command used and terminal output

NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_STRUCTURAL_VARIANTS:SVDB_QUERY_DB command.sh:

#!/bin/bash -euo pipefail
svdb \
    --merge \
    --pass_only --same_order \
    --priority tiddit,manta,cnvnator \
    --vcf  NA12878_tiddit.vcf.gz:tiddit NA12878_manta.diploid_sv.vcf.gz:manta NA12878_cnvnator.vcf.gz:cnvnator \
    > NA12878_sv.vcf
bgzip NA12878_sv.vcf

cat <<-END_VERSIONS > versions.yml
"NFCORE_RAREDISEASE:RAREDISEASE:CALL_STRUCTURAL_VARIANTS:SVDB_MERGE":
    svdb: $( echo $(svdb) | head -1 | sed 's/usage: SVDB-\([0-9]\.[0-9]\.[0-9]\).*/\1/' )
    samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
END_VERSIONS

--------


Output (command.log): 

INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Error: frequency or hit tag not found! Make sure to set the --in_occ AND --in_frq to the number and frequency of alleles/individuals as presented in the INFO column of the input db

database variants not having the --in_occ or --in_frq tag must be removed
you may also skip these parameters and cluster based on the GT entry of the format column (if such exists)

Relevant files

No response

System information

No response

@fa2k fa2k added the bug Something isn't working label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant