Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it necessary to extract ChrM first #2

Open
Captain-Pam opened this issue Sep 16, 2022 · 2 comments
Open

Is it necessary to extract ChrM first #2

Captain-Pam opened this issue Sep 16, 2022 · 2 comments

Comments

@Captain-Pam
Copy link

Hi,
Thank you for your tools. I am trying to make it work. I have encountered some problems.

  1. There are many reference files in RefSeq . I installed HP_RNAME=hg38 for the first time later, and HP_RNAME=hg19 later. So do the two reference genome versions HP_RNAME=hg19 and HP_RNAME=hg38 differ only by a few files NUMT and chrM in the RefSeq?
  2. My bam file is quite large, do I need to extract the telomeric bam file before running the program?
    It seems impossible, because the calculation of mtDNA-CN requires bam files for all autosomes.
@dpuiu
Copy link
Owner

dpuiu commented Sep 23, 2022

Hello Captain-Pam and sorry about the delay in getting back to you.

  1. If you are planning to use hg19 instead of hg38, just copy over and edit the "init.sh" file. Comment the ~6 lines and #hs38DH(default) and uncomment the ones under #hg19

  2. Do you mean telomeric or mitochondrial? If you want to work with smaller bam files, you can prefilter the large ones, however for mtDNA-CN computation your have to use the idxstats corresponding to the original alignments ( which contain the read counts for all the chromosomes)

$ mkdir new_dir
$ samtools view file.bam chrM chr1:564465-569708 chr17:22020692-22020827 -b ... > new_dir/file.bam # (hg19)
$ samtools index new_dir/file.bam
$ samtools idxstats file.bam > new_dir/file.idxstats

@Captain-Pam
Copy link
Author

Hey, Thank your for your reply. I have got it working. I also compared software 'fastMitoCalc' to calculate mtDNA-CN, which was used in your paper.
But I found the mtDNA-CN using MitoHPC was more copy numbers than that using fastMitoCalc. The biggest gap can reach 2600 and I checked the 'idxstats' file and recalculated the mtDNA-CN using the formula in the paper and the mtDNA-CN was the same as 'M' in the count.tab (M column).

  1. Is this mtDNA-CN gap between 'MitoHPC' and 'fastMitoCalc' caused by the fact that fastMitoCalc is a random sampling of a certain number of regions to calculate mtDNA-CN or just using chromosome 1-22? Here are my two results from MitoHPC (mtDNA-CN: 3366) and fastMitoCalc (mtDNA-CN: 693.719)
    MitoHPC :
    chrM 16571 13458639 44004
    chr1 249250621 118214482 314343
    chr2 243199373 117972464 427951
    chr3 198022430 85924989 269176
    chr4 191154276 98803920 267515
    chr5 180915260 77116047 222679
    chr6 171115067 74220704 212901
    chr7 159138663 76211898 219534
    chr8 146364022 67936630 217216
    chr9 141213431 55903272 159052
    chr10 135534747 100631038 233961
    chr11 135006516 60413722 174248
    chr12 133851895 59082673 167006
    chr13 115169878 39987143 116357
    chr14 107349540 41120393 122057
    chr15 102531392 36480936 102307
    chr16 90354753 47841347 117311
    chr17 81195210 41463621 106312
    chr18 78077248 37650165 103872
    chr19 59128983 34102600 82349
    chr20 63025520 26468498 75683
    chr21 48129895 18105490 54604
    chr22 51304566 15652517 41953
    chrX 155270560 71478267 188991
    chrY 59373566 8816552 34535
    chr1_gl000191_random 106433 35277 82
    chr1_gl000192_random 547496 229270 679
    chr4_ctg9_hap1 590426 119302 399
    chr4_gl000193_random 189789 174745 574
    chr4_gl000194_random 191469 139145 415
    chr6_apd_hap1 4622290 128934 385
    chr6_cox_hap2 4795371 384416 1016
    chr6_dbb_hap3 4610396 285520 793
    chr6_mann_hap4 4683263 257711 812
    chr6_mcf_hap5 4833398 286283 745
    chr6_qbl_hap6 4611984 263381 805
    chr6_ssto_hap7 4928567 294652 814
    chr7_gl000195_random 182896 266561 805
    chr8_gl000196_random 38914 13476 33
    chr8_gl000197_random 37175 5782 23
    chr9_gl000198_random 90085 212405 700
    chr9_gl000199_random 169874 10107120 15130
    chr9_gl000200_random 187035 30469 91
    chr9_gl000201_random 36148 3949 13
    chr11_gl000202_random 40103 9874 26
    chr17_ctg5_hap1 1680828 242877 775
    chr17_gl000203_random 37498 19638 44
    chr17_gl000204_random 81310 19130 85
    chr17_gl000205_random 174588 231417 653
    chr17_gl000206_random 41001 4928 16
    chr18_gl000207_random 4262 3083 49
    chr19_gl000208_random 92689 426440 1108
    chr19_gl000209_random 159169 35332 87
    chr21_gl000210_random 27682 3730 14
    chrUn_gl000211 166566 93986 270
    chrUn_gl000212 186858 196808 957
    chrUn_gl000213 164239 39830 95
    chrUn_gl000214 137718 592308 1312
    chrUn_gl000215 172545 26726 93
    chrUn_gl000216 172294 2987761 6992
    chrUn_gl000217 172149 122195 384
    chrUn_gl000218 161147 239626 549
    chrUn_gl000219 179198 185568 1029
    chrUn_gl000220 161802 5933799 13490
    chrUn_gl000221 155397 186931 492
    chrUn_gl000222 186861 105965 349
    chrUn_gl000223 180455 49609 93
    chrUn_gl000224 179693 686448 1711
    chrUn_gl000225 211173 1168363 4173
    chrUn_gl000226 15008 5528413 7990
    chrUn_gl000227 128374 33551 101
    chrUn_gl000228 129120 384732 754
    chrUn_gl000229 19913 47762 148
    chrUn_gl000230 43691 31270 115
    chrUn_gl000231 27386 51359 149
    chrUn_gl000232 40652 97229 295
    chrUn_gl000233 45941 37658 88
    chrUn_gl000234 40531 85785 243
    chrUn_gl000235 34474 52575 159
    chrUn_gl000236 41934 15104 47
    chrUn_gl000237 45867 99997 231
    chrUn_gl000238 39939 8884 24
    chrUn_gl000239 33824 58609 128
    chrUn_gl000240 41933 26847 75
    chrUn_gl000241 42152 97210 294
    chrUn_gl000242 43523 9432 25
    chrUn_gl000243 43341 41654 106
    chrUn_gl000244 39929 18093 65
    chrUn_gl000245 36651 41757 123
    chrUn_gl000246 38154 12727 32
    chrUn_gl000247 36422 23342 42
    chrUn_gl000248 39786 8494 21
    chrUn_gl000249 38502 7260 14
  • 0 0 56407410

fastMitoCalc:
mt_copy_number_avg: 693.719
mt_coverage: 7916.24
autosomal_coverage: 22.8226
actual basepairs covered by reads: 3000000
chrom_used_for_autosomal_coverage: 1-22

  1. I found that the chrM in the 'idxstats' file is not 16569 but 16571. I set it to HP_MT=chrM, HP_MTLEN=16569 in 'init.sh'. I don't know why it became 16571 here. On the other hand, HP_MT=chrM and HP_MT= rCRS (or RSRS) are different reference sequences, the article says rCRS is used, but the code is HP_MT=chrM.

export HP_MT=chrM # chrM, rCRS or RSRS, FASTA file available under $HP_RDIR

  1. chrM reference sequence (rCRS or RSRS) were from a European individual. Can other populations (e.g., Asian populations) use the chrM reference sequence above (rCRS or RSRS) in haplogroup and mitochondrial variants analysis?

I look forward to hearing from you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants