Abstract
Hispanic populations in the United States are highly admixed and genetically diverse, yet remain underrepresented in genomic studies. To address this, we present the first large-scale long-read sequencing analysis of 1,490 self-reported Hispanic individuals from the All of Us Research Program, capturing small variants, structural variants, tandem repeats (TRs), and CpG methylation. We characterize global and local ancestry across the cohort, enabling ancestry-aware analysis of genetic and epigenetic features. Over 10.3 million previously unknown autosomal variants are identified, including medically relevant alleles stratified by local ancestry and pathogenic risk revealing 402 carriers with potential risk for subsequent generations. We discover 135 individuals with TR alleles exceeding established pathogenic ranges, and conduct the first genome-wide TR-mQTL analysis, identifying 3,329 TR alleles associated with methylation. Allele-specific methylation (ASM) is resolved at >12,000 loci per genome and 24 novel recurrent ASM loci are identified. This includes ancestry specific regulatory activity such as activation of paralogous genes driven by ancestry-enriched variants and epigenetic markers. These findings establish a foundational resource for biomedical research and highlight the critical role of ancestry-aware analyses in understanding gene regulation, disease risk, and personalized medicine.
About the speaker
Dr. Fritz Sedlazeck is an Associate Professor at the Human Genome Sequencing Center at Baylor College of Medicine and an Adjunct Associate Professor at Rice University. His research focuses on algorithmic developments and high-performance computing for genomic and genetic applications. Specifically, he studies ways to improve the characterization of complex genomic alterations between individuals’ genomes based on large genomic sequencing data and as such improve our understanding of complex phenotypes such as human diseases.
