The following recently published paper goes above and beyond that of a typical Nature Genetics GWAS:
Agarwala, V., Flannick, J., Sunyaev, S., and Altshuler, D. (2013). Evaluating empirical bounds on complex disease genetic architecture. Nat. Genet., 1–12.
The paper is split into three stages, any one of which would constitute their own smaller paper: modeling demography, determining the number of loci impacting disease based on coupling with natural selection and mutational target size, and generating both empirical and simulated results for Type II Diabetes.
The paper starts by comparing demographic models in three previous papers (Gravel et al, Kryukov et al, and Schaffner et al). The authors assess the site frequency spectra, strength of selection, minor allele count for nonsynonymous vs synonymous sites, and linkage disequilibrium patterns. In the end, they adjust the parameters of the three models to generate their own hybrid model to best fit their data. Europeans are the population of interest in this paper because the T2D study participants are all of European descent, and Agarwala choose the following parameters to best fit the data:
NA (ancestral population size) = 8,100
NB (bottleneck population size) = 2,000
t (duration of exponential growth in generations) = 370
r (rate of exponential growth) = 1.29%
NE (modern effective population size) = 227,650
μ (mutation rate per bp per generation) = 2.0e-08
As a population geneticist, this aspect of the paper is the most exciting. Understanding ancestry is a critical part of identifying true associations, rather than simply identifying sites that are informative of ancestry substructure (As an aside, see this interesting recent cautionary tale. Another point that arises from their simulations is that >90% of deleterious NS variants are rare (MAF < 0.1%), but fewer than 45% of all rare NS variants are deleterious. Additionally, most rare variants are recently derived and neutral/weakly deleterious, which are consistent with PolyPhen results.
Of course, there are limitations to their model that could be expanded in the future. Only purifying selection and exome capture regions are considered. Pulses of migration would likely also affect their model. The authors acknowledge these points, and suggest that if their model is consistent with the data, the results are at least reasonable.
Another very interesting aspect of the paper is their model of coupling between purifying selection and the likelihood of disease, accounted for by the variable τ. If τ = 1, variants with large effects on fitness have large effects on the disease, whereas if τ = 0, there is no relationship between selection coefficients of causal mutations and their impact on disease. With simply two parameters (τ and T, the mutational target size), a full range of genetic architecture for complex disease is possible. The authors find that for T2D, the higher the coupling term, τ, the smaller the target size for mutation and the fewer causal loci are expected. Conversely, the lower the coupling term, the larger the target size for mutation and the more causal loci are expected. The authors consider two models that are consistent with the data and have widely varying τ and T values and find widely varying results from their GWAS with large cohort sizes (N=10,000 and N=85,000).
The last portion of their paper looks forward to future T2D studies and estimates the variance explained by varying cohort sizes and genetic ascertainment. At the scale of 250,000 full genomes sequenced (20k cases and 230k controls, matching the prevalence of the disease), they expect that 75% of the genetic variance should be explained. Someday we will surely reach this level, and it will be interesting to see how physicians integrate lab tests for environmental impacts (i.e. how many cheeseburgers you eat a week) and genetic risk. I am curious to see how transferable these genetic risks will be to other populations, given that the vast majority of rare variants are private to populations. I also look forward to seeing how functional genomics is integrated to aid clinical recommendations.