Skip to content

Detect AMR mutations in Salmonella Paratyphi A genomes based on VCF or BAM files (mapped to Paratyphi A AKU_12601 reference genome)

License

Notifications You must be signed in to change notification settings

zadyson/GenoParatyphi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

GenoParatyphi

Please note that GenoParatyphi is no longer updated, the newly developed Paratype tool carries out the same AMR calling functions as GenoParatyphi, but also types sequences accoring to a newly developed Paratyphi A genotyping framework.

Detect AMR mutations in Salmonella Paratyphi A genomes based on VCF or BAM files (mapped to Paratyphi A AKU_12601 reference genome)

This script detects mutations associated with AMR in Salmonella Paratyphi A genomes including the QRDR (quinolone-resistance determining region) of genes gyrA and parC reported previouly ["Laboratory and Molecular Surveillance of Paediatric Typhoidal Salmonella in Nepal: Antimicrobial Resistance and Implications for Vaccine Policy", Britto et al 2018, PLoS NTDs], and mutations in acrB reported by Hooda et al 2019 ["Molecular mechanism of azithromycin resistance among typhoidal Salmonella strains in Bangladesh identified through passive pediatric surveillance", Hooda et al 2019].

The name genoparatyphi refers to the Salmonella Typhi genotyping framework described in this paper : ["An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid", Wong et al, 2016, Nature Communications, available at: https://github.com/katholt/genotyphi/.

Genoparatyphi does not currently type Paratyphi A genomes by lineage, nor does it detect acquired genes associated with AMR, for the later we recommend SRST2, available at: https://github.com/katholt/srst2/.

Inputs are BAM or VCF files (mapped to Paratyphi A AKU_12601 reference genome, accession number FM200053).

For short read data, we recommend using the raw alignments (BAM files) as input (via --bam), since VCFs generated by different pipelines may have filtered out important data at the AMR loci. Note you also need to supply the reference sequence that was used. If you don't have BAMs or you are really confident in the quality of your SNP calls in existing VCF files, you can provide your own VCFs (generated by read mapping) via --vcf.

For assemblies, we recommend using ParSNP (version 1.0.1) to align genomes to the AKU_12601 reference. The resulting multi-sample VCF file(s) can be passed directly to this script via --vcf_parsnp.

Dependencies: Python 2.7.5+ with Biopython (SAMtools and BCFtools are also required if you are working from BAM files. Genoparatyphi has been tested with versions 1.1, 1.2, and 1.3 of both SAMtools and bcftools, subsequently we advise using the same version of both of these dependencies together i.e. SAMtools v1.2 and bcftools v1.2).

Basic Usage

Options

Outputs

Generating input BAMS from reads (with example)

Generating input VCFs from assemblies (with example)

Basic Usage - own BAM (recommended if you have reads)

Note the BAM files must be sorted (e.g. using samtools sort)

python genoparatyphi.py --mode bam --bam *.bam --ref FM200053.fasta --ref_id FM200053.1 --output genotypes.txt

Basic Usage - own VCF

python genoparatyphi.py --mode vcf --vcf *.vcf --ref_id FM200053 --output genotypes.txt

Basic Usage - assemblies aligned with ParSNP (recommended if you only have assembly data available and no reads)

python genoparatyphi.py --mode vcf_parsnp --vcf parsnp.vcf --output genotypes.txt

Options

Required options

-- mode            Run mode, either bam, vcf or vcf_parsnp

Mode specific options

--mode bam

Requires SAMtools and BCFtools

--bam              Path to one or more BAM files, generated by mapping reads to the Paratyphi A AKU_12601 (FM200053)
                   Note the SNP coordinates used here for genotyping are relative to Paratyphi A AKU_12601 (FM200053) 
                   so the input MUST be a BAM obtained via mapping to this reference sequence.

--ref              Reference sequence file used for mapping (fasta).

--ref_id           Name of the Paratyphi A AKU_12601 (FM200053) chromosome reference in your VCF file.
                   This is the entry in the first column (#CHROM) of the data part of the file.
                   This is necessary in case you have mapped to multiple replicons (e.g. chromosome and
                   plasmid) and all the results appear in the same VCF file.

--samtools_location     Specify the location of the folder containing the SAMtools installation if not standard/in path [optional]

--bcftools_location     Specify the location of the folder containing the bcftools installation if not standard/in path [optional]

--mode vcf

--vcf              Path to one or more VCF files, generated by mapping reads to the Paratyphi A AKU_12601 (FM200053)
                   Note the SNP coordinates used here for genotyping are relative to Paratyphi A AKU_12601 (FM200053)
                   so the input MUST be a VCF obtained via mapping to this reference sequence.

--mode vcf_parsnp

--vcf              Path to one or more VCF files generated by mapping assemblies to the Paratyphi A AKU_12601 (FM200053) reference genome
                   using ParSNP (--ref_id is optional, default value is '1').

Other options

--phred                 Minimum phred quality to count a variant call vs Paratyphi A AKU_12601 (FM200053) as a true SNP (default 20)

--min_prop              Minimum proportion of reads required to call a SNP (default 0.1)

--output                Specify the location of the output text file (default genotypes_[timestamp].txt)

Outputs

Output is to standard out, in tab delimited format.

Generating input BAMS from reads

We recommend using Bowtie2 to align reads to the Paratyphi A AKU_12601 (FM200053) reference, and SAMtools to convert the *.sam file to the *.bam format. The resulting bam file(s) can be passed directly to this script via --bam. Please note the differences in the commands listed below when using SAMtools v1.1/1.2 vs. SAMtools v1.3.1 to generate bam files.

Generating input VCFs from assemblies

We recommend using ParSNP to align genomes to the Paratyphi A AKU_12601 (FM200053) reference. The resulting multi-sample VCF file(s) can be passed directly to this script via --vcf_parsnp.

About

Detect AMR mutations in Salmonella Paratyphi A genomes based on VCF or BAM files (mapped to Paratyphi A AKU_12601 reference genome)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages