Analysis and benchmarking of small and large genomic variants across tandem repeats

admin


  • Levinson, G. & Gutman, G. A. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4, 203–221 (1987).

    CAS 
    PubMed 

    Google Scholar
     

  • Fan, H. & Chu, J.-Y. A brief review of short tandem repeat mutation. Genom. Proteom. Bioinform. 5, 7–14 (2007).

    Article 
    CAS 

    Google Scholar
     

  • Shriver, M. D., Jin, L., Chakraborty, R. & Boerwinkle, E. VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. Genetics 134, 983–993 (1993).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wright, J. M. Mutation at VNTRs: are minisatellites the evolutionary progeny of microsatellites? Genome 37, 345–347 (1994).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Willems, T. et al. The landscape of human STR variation. Genome Res. 24, 1894–1904 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ren, J., Gu, B. & Chaisson, M. J. P. vamos: variable-number tandem repeats annotation using efficient motif sets. Genome Biol. 24, 175 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Noyes, M. D. et al. Familial long-read sequencing increases yield of de novo mutations. Am. J. Hum. Genet. 109, 631–646 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • DeJesus-Hernandez, M. et al. Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS. Neuron 72, 245–256 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Depienne, C. & Mandel, J.-L. 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am. J. Hum. Genet. 108, 764–785 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mirceta, M., Shum, N., Schmidt, M. H. M. & Pearson, C. E. Fragile sites, chromosomal lesions, tandem repeats, and disease. Front. Genet. 13, 985975 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hannan, A. J. Repeat DNA expands our understanding of autism spectrum disorder. Nature 589, 200–202 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Hannan, A. J. Tandem repeats mediating genetic plasticity in health and disease. Nat. Rev. Genet. 19, 286–298 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Stanley, U. et al. Forensic DNA profiling: autosomal short tandem repeat as a prominent marker in crime investigation. Malays. J. Med. Sci. 27, 22–35 (2020).


    Google Scholar
     

  • Hall, C. L. et al. Accurate profiling of forensic autosomal STRs using the Oxford Nanopore Technologies MinION device. Forensic Sci. Int. Genet. 56, 102629 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Warner, J. P. et al. A general method for the detection of large CAG repeat expansions by fluorescent PCR. J. Med. Genet. 33, 1022–1026 (1996).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jeffreys, A. J., Wilson, V. & Thein, S. L. Hypervariable ‘minisatellite’ regions in human DNA. Nature 314, 67–73 (1985).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Dolzhenko, E. et al. ExpansionHunter: a sequence-graph based tool to analyze variation in short tandem repeat regions. Bioinformatics 35, 4754–4756 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Willems, T. et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods 14, 590–592 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mousavi, N., Shleizer-Burko, S., Yanicky, R. & Gymrek, M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res. 47, e90 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dolzhenko, E. et al. Characterization and visualization of tandem repeats at genome scale. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-02057-3 (2024).

    Article 
    PubMed 

    Google Scholar
     

  • Chiu, R., Rajan-Babu, I.-S., Friedman, J. M. & Birol, I. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biol. 22, 224 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Aganezov, S. et al. A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rhie, A. et al. The complete sequence of a human Y chromosome. Nature 621, 344–354 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Majidian, S., Agustinho, D. P., Chin, C.-S., Sedlazeck, F. J. & Mahmoud, M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol. 24, 221 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom. 2, 100128 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol. 40, 672–680 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yang, J. & Chaisson, M. J. P. TT-Mars: structural variants assessment based on haplotype-resolved assemblies. Genome Biol. 23, 110 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Audano, P. A. & Beck, C. R. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res. 34, 7–19 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Fu, Y., Mahmoud, M., Muraliraman, V. V., Sedlazeck, F. J. & Treangen, T. J. Vulcan: improved long-read mapping and structural variant calling via dual-mode alignment. GigaScience 10, giab063 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gelfand, Y., Rodriguez, A. & Benson, G. TRDB—the Tandem Repeats Database. Nucleic Acids Res. 35, D80–D87 (2007).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Halman, A., Dolzhenko, E. & Oshlack, A. STRipy: a graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data. Hum. Mutat. 43, 859–868 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Saini, S., Mitra, I., Mousavi, N., Fotsing, S. F. & Gymrek, M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat. Commun. 9, 4397 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Benson, G. Tandem Repeats Finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Smit, A., Hubley, R. & Green, P. RepeatMasker. http://www.repeatmasker.org (2013).

  • Wlodzimierz, P., Hong, M. & Henderson, I. R. TRASH: tandem repeat annotation and structural hierarchy. Bioinformatics 39, btad308 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Novák, P., Neumann, P. & Macas, J. Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2. Nat. Protoc. 15, 3745–3776 (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Delucchi, M., Näf, P., Bliven, S. & Anisimova, M. TRAL 2.0: tandem repeat detection with circular profile hidden Markov models and evolutionary aligner. Front. Bioinform. 1, 691865 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • El-Sawy, M. & Deininger, P. Tandem insertions of Alu elements. Cytogenet. Genome Res. 108, 58–62 (2004).

    Article 

    Google Scholar
     

  • Moretti, T. R. et al. Population data on the expanded CODIS core STR loci for eleven populations of significance for forensic DNA analyses in the United States. Forensic Sci. Int. Genet. 25, 175–181 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Stevanovski, I. et al. Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing. Sci. Adv. 8, eabm5386 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Pellerin, D. et al. Deep intronic FGF14 GAA repeat expansion in late-onset cerebellar ataxia. N. Engl. J. Med. 388, 128–141 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tan, D. et al. CAG repeat expansion in THAP11 is associated with a novel spinocerebellar ataxia. Mov. Disord. 38, 1282–1293 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Mukamel, R. E. et al. Protein-coding repeat polymorphisms strongly shape diverse human phenotypes. Science 373, 1499–1505 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu, Z. et al. Inconsistent genotyping call at DYS389 locus and implications for interpretation. Int. J. Legal Med. 132, 1043–1048 (2018).

    Article 
    PubMed 

    Google Scholar
     

  • White, P. S., Tatum, O. L., Deaven, L. L. & Longmire, J. L. New, male-specific microsatellite markers from the human Y chromosome. Genomics 57, 433–437 (1999).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Vinces, M. D., Legendre, M., Caldara, M., Hagihara, M. & Verstrepen, K. J. Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324, 1213–1216 (2009).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sulovari, A. et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl Acad. Sci. USA 116, 23243–23253 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Annear, D. J. et al. Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease. Sci. Rep. 11, 2515 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Garg, S. et al. Chromosome-scale, haplotype-resolved assembly of human genomes. Nat. Biotechnol. 39, 309–312 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jarvis, E. D. et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature 611, 519–531 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dunn, T. & Narayanasamy, S. vcfdist: accurately benchmarking phased small variant calls in human genomes. Nat. Commun. 14, 8149 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cleary, J. G. et al. Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines. Preprint at bioRxiv https://doi.org/10.1101/023754 (2015).

  • Tan, A., Abecasis, G. R. & Kang, H. M. Unified representation of genetic variants. Bioinformatics 31, 2202–2204 (2015).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Marco-Sola, S., Moure, J. C., Moreto, M. & Espinosa, A. Fast gap-affine pairwise alignment using the wavefront algorithm. Bioinformatics 37, btaa777 (2020).


    Google Scholar
     

  • Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Park, J., Kaufman, E., Valdmanis, P. N. & Bafna, V. TRviz: a Python library for decomposing and visualizing tandem repeat sequences. Bioinform. Adv. 3, vbad058 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Krause, A. et al. Junctophilin 3 (JPH3) expansion mutations causing Huntington disease like 2 (HDL2) are common in South African patients with African ancestry and a Huntington disease phenotype. Am. J. Med. Genet. B 168, 573–585 (2015).

    Article 
    CAS 

    Google Scholar
     

  • Wieben, E. D. et al. A common trinucleotide repeat expansion within the transcription factor 4 (TCF4, E2-2) gene predicts Fuchs corneal dystrophy. PLoS ONE 7, e49083 (2012).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jam, H. Z. et al. A deep population reference panel of tandem repeat variation. Nat. Commun. 14, 6711 (2023).

    Article 

    Google Scholar
     

  • Bakhtiari, M., Shleizer-Burko, S., Gymrek, M., Bansal, V. & Bafna, V. Targeted genotyping of variable number tandem repeats with adVNTR. Genome Res. 28, 1709–1719 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sonay, T. B. et al. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res. 25, 1591–1599 (2015).

    Article 
    CAS 

    Google Scholar
     

  • Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res. 49, D884–D891 (2020).

    Article 
    PubMed Central 

    Google Scholar
     

  • English, A. Project Adotto tandem-repeat regions and annotations. Zenodo 10.5281/zenodo.8387564 (2022).

  • Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • English, A. Project Adotto whole-genome variants. Zenodo 10.5281/zenodo.6975244 (2022).

  • Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chin, C.-S. et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat. Commun. 11, 4794 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wootton, J. C. & Federhen, S. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17, 149–163 (1993).

    Article 
    CAS 

    Google Scholar
     

  • Šošić, M. & Šikić, M. Edlib: a C/C++ library for fast, exact sequence alignment using edit distance. Bioinformatics 33, btw753 (2016).


    Google Scholar
     

  • Bonfield, J. K. et al. HTSlib: C library for reading/writing high-throughput sequencing data. GigaScience 10, giab007 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • English, A. et al. GIAB TandemRepeats benchmark v1.0. https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/TandemRepeats_v1.0 (2023).

  • English, A. et al. GIAB TR comparison VCFs. Zenodo 10.5281/zenodo.10724503 (2024).

  • English, A. et al. Working space for the GIAB TR benchmarking project. GitHub https://github.com/ACEnglish/adotto (2023).

  • English, A. Structural variant toolkit for VCFs. GitHub https://github.com/ACEnglish/truvari (2023).

  • English, A. et al. Library for variant benchmarking stratification. GitHub https://github.com/ACEnglish/laytr (2023).

  • Olson, N. A snakemake based pipeline to build Adotto TR databases. GitHub https://github.com/nate-d-olson/adotto-smk (2023).

  • English, A. A rust implementation of regioneR for interval overlap permutation testing. GitHub https://github.com/ACEnglish/regioners (2023).



  • Source link

    Leave a comment