I am a postdoc in Mark Daly’s lab in the Analytic & Translational Genetics Unit at Massachusetts General Hospital as well as with the Stanley Center for Psychiatric Research at the Broad Institute. I completed my PhD in Genetics and MS in Biomedical Informatics in Carlos Bustamante’s lab at Stanford. I am interested in human demography and leveraging evolutionary principles to discover genetic predictors of phenotypic diversity. My research explores the role of genetic and regulatory variation in diverse and admixed populations. More specifically, I work in the following areas:

The impact of genetic diversity on regulatory mechanisms

HGDP RNAseq mapThe first chapter of my PhD thesis examined regulatory variation across diverse human populations to explore their potential roles in disease mechanisms, specifically at the transcriptome and epigenetic levels. To understand transcriptome variation across diverse human populations, we sequenced mRNA, exomes, and genomes from lymphoblastoid cell lines (LCLs) derived from 45 individuals and 7 populations in the Human Genome Diversity Panel (HGDP). These populations were explicitly chosen to span the breadth of human migration history to enable fine-scale analyses comparing genetic distances between populations to differences in gene expression and transcript splicing. We found that ~25% of the variation in transcription among individuals can be attributed to population differences, and that ~76% of this population-specific variation is due to gene expression rather than splicing variability. Our results also indicated that common expression quantitative trait loci (eQTLs) identified previously replicated consistently across populations without regard to genetic distance. Focusing on the DNA data from HGDP, we found evidence of increasing mutational load with distance from sub-Saharan Africa. We also analyzed genetic effects on epigenetic modifications using ChIP-seq, HiC, and ChIA-PET data in Yoruban LCLs. We identified histone quantitative trait loci (hQTLs) and found that genetic variation modulates chromatin states both locally and distally. As part of this project, I found that hQTLs are significantly enriched for genome-wide association study variants, especially those associated with autoimmune diseases.

The design and advancement of genotyping technologies for minority health

Local ancestryThe second chapter of my PhD focused on the design and development of genotyping technologies for the advancement of minority health. As part of this chapter, I assessed genotype imputation quality across multiple arrays and populations to learn how to better design genotyping technologies. I learned that imputation using next-generation arrays that focus on exome content performs poorly, and that globally common variation is the most informative for imputation within and across populations. Our results provided design principles for generating a balanced array, which is critical for globally diverse population studies: select variants that are common among populations, reduce European ascertainment biases, select variants for genotyping arrays in proportion to how many variants they tag, and weight SNPs in proportion to linkage disequilibrium decay across populations. With these considerations in mind, I participated in the Population Architecture using Genomics and Epidemiology (PAGE) Consortium, where I leveraged my previous experience in cross-population imputation to help develop the multi-ethnic genotyping array (MEGA), which is now being used to genotype ~50,000 samples from the PAGE consortium and is commercially available from Illumina. An important consideration for the design of genotyping technologies in minority groups is the local ancestry across chromosomes and imputation accuracy within ancestry tracts. I became especially interested in variation across recently admixed populations and concurrently joined the 1000 Genomes Consortium, where I performed ancestry deconvolution in Hispanic/Latino and African American samples to paint chromosomes with Native American, European, and African ancestry.

A complex, polygenic architecture for lightened skin pigmentation in the Southern African KhoeSan

KalahariSkin pigmentation is one of the most recognizably diverse phenotypes in humans across the globe, but its highly genetic basis has mainly been studied in northern European and Asian populations. The Eurasian pigmentation alleles are among the most differentiated variants in the genome, suggesting strong positive selection for light skin pigmentation. Light skin pigmentation is also observed in the far southern latitudes of Africa, among KhoeSan hunter-gatherers of the Kalahari Desert and other populations. The KhoeSan hunter-gatherers are among the oldest human populations, believed to have diverged from other populations 100,000 years ago, and maintain extraordinary levels of genetic diversity. It is unknown whether light skin pigmentation represents convergent evolution or the ancestral human phenotype. We have collected ethnographic information, pigmentation phenotypes, and DNA from >400 KhoeSan individuals from the Kalahari and Richtersveld. To understand the genetic basis for light skin pigmentation, we have also exome sequenced 83 ≠Khomani San individuals to high coverage, generating one of the largest indigenous African exome datasets sequenced outside of the 1000 Genomes Project. We demonstrate that skin pigmentation is highly heritable in these groups, but known pigmentation loci explain only a small fraction of the variance. Rather, baseline skin pigmentation is a complex, polygenic trait in the KhoeSan. We identify novel skin pigmentation loci using a genome-wide association approach complemented by targeted resequencing of pigmentation genes and functional follow-up. The KhoeSan carry several light skin pigmentation alleles discovered in Eurasians, such the canonical SLC24A5 allele, at higher frequencies than expected by the proportion of recent European admixture, indicating that standing variation likely contributed in part to lightened skin color in other populations. Our results highlight the strength of diverse population studies to explain phenotypic variation in the context of human evolutionary history.

Developing haplotype association methods for insights into autism spectrum disorders

A central challenge in medical and population genetics is identifying disease-conferring variants, interpreting their functional roles across individuals and populations, and designing therapeutics to ameliorate the disease. Genome-wide association studies (GWAS) have successfully identified a myriad of risk variants associated with many disease traits, but a clear understanding of genetic effects that drive highly heritable psychiatric diseases such as autism spectrum disorders (ASDs) has fallen short. ASD severely reduces fecundity by ~65%, indicating that negative selection plays a significant role in rapidly eliminating variants conferring even modest disease risk from human populations. Consequently, loci contributing risk are expected to be significantly rarer on average than neutral variation.

Because negatively selected ASD-risk variants are expected to be short-lived, the haplotypes on which they reside are longer than average; as a result, haplotype-based mapping methods are significantly more powerful than standard GWAS analytic approaches for mapping negatively selected loci. However, despite the fact that large-scale genotyping data can be used for such analyses, development and evaluation of haplotype mapping methods has lagged considerably behind standard GWAS methods. We are improving haplotype mapping techniques with consideration for multiple demographic histories, enabling the discovery of novel disease-risk haplotypes and providing insight into their evolutionary pressures. We will apply our methodology to discover ASD risk factors in the largest case-control studies to date, bypassing previous study limitations in sample size and methodology.