THE INHERITED BASIS OF COMMON DISEASES
A central question in medicine is to understand why some people get sick and others do not. We seek these answers for multiple reasons: to provide explanations to our patients, to predict disease risk early enough to prevent it, and most important, to understand pathophysiology so as to design rational approaches to prevention and therapy. In some cases, a single environmental exposure is found to play a major role in disease (e.g., smoking and lung cancer, or HIV infection and AIDS). In others, such as Huntington disease or cystic fibrosis, mutation of a single gene is both necessary and sufficient to cause illness. Of course, singular answers are the exception rather than the rule; in most cases, disease arises from the combined action of inborn and somatically acquired alterations in genome sequence, environmental and behavioural exposures, and bad luck. Such disorders, which explain most morbidity and mortality in human populations, are termed complex traits.
As a tool for generating new hypotheses about the root causes of disease, human genetics has a number of unique features. First, it is now possible to systematically query the entire genome sequence of an individual in a manner unlimited by any prior assumption about underlying genes and pathophysiologic processes responsible. Second, because the constitutional genome sequence is established at conception and unaltered throughout life, associations between genome sequence and human phenotype can be interpreted as causal rather than reactive in their relationship to disease. However, although we have entered an era in which the specific genes and variants that contribute to risk for common human diseases can be identified, much work is needed to understand their functions and to learn whether and how this knowledge can improve the practice of medicine.
Susceptibility to disease varies within and across human populations. Studies of familial aggregation can determine the extent to which inherited difference in the genome sequence contributes to variation in disease risk. Such studies are simple in concept and ask whether members of the same family are more similar in disease risk compared with individuals chosen at random from the population. Of course, familial clustering can reflect not only shared genotype but also shared environment. The contribution of shared genotype can be dissected further by examining concordance of disease in proportion to the extent of genetic relatedness. The simplest such design involves comparing rates of disease concordance among dizygotic and monozygotic twin pairs. More sophisticated methods have now been developed in which the relatedness of individuals is estimated directly from genotype data (rather than based on pedigrees) and concordance compared with these empirically derived estimates of relatedness. With each of these approaches, common diseases such as types 1 and 2 diabetes mellitus, obesity, hypertension, coronary artery disease, autoimmune diseases, common cancers, schizophrenia, and bipolar disease show rates of disease concordance that rise with genetic similarity. However, many other traits of clinical interest (e.g., most drug responses) have not been studied with these methods, and thus the role of inheritance in these characteristics cannot be assumed. That is, variability in a clinical phenotype (such as drug response) cannot be assumed to be inherited in nature—family studies or molecular genetic studies are needed to draw any such conclusion.
Data about familial aggregation allow the calculation of heritability , defined as the fraction of interindividual variability in disease risk attributable to additive genetic influences. In this framework, the remaining variability among individuals is due to the sum of all other contributions to disease risk: environmental influences on disease, nonadditive (epistatic) genetic effects (e.g., gene-gene interactions or gene-environment interactions), error in the measurement of relatedness or disease, and random chance. For most clinically important traits (diseases and risk factors), empirical estimates of heritability range from 20 to 80%.
In interpreting estimates of heritability, it is important to consider two crucial factors: the effect of measurement errors and the environmental context. Measurement errors decrease the estimate of the observed heritability of a trait. For example, a single measurement of blood pressure is much less heritable than a composite score based on serial measures of blood pressure over time. That is, estimates of heritability are lower bounds because day-to-day variability and imprecision in clinical measures can obscure the underlying biologic susceptibility entrained by inheritance. For the patient and physician, this means that although the blood pressure on a given day may not be particularly heritable, the blood pressure over time (which is the relevant risk factor for vascular disease) is heritable to a greater extent.
Second, estimates of heritability apply only to the context of the environment in which the study was performed. In the case in which environmental triggers of disease are relatively constant across a study population, inherited factors may explain much of the variation in rates of disease. In contrast, in the case in which exposure to environmental causes of disease is highly varied across the study population, nongenetic factors may outweigh the contribution of that same extent of variability in inborn susceptibility. For example, the rate and diversity of smoking behaviour have a major impact on how much of the variability in rates of lung cancer (in any given study or patient cohort) is explained by inheritance. If smoking was absent from a given population (or ubiquitous), little of the variation in lung cancer risk would be due to smoking behaviour; if, in contrast, half the population smoked multiple packs a day and the other half not at all, smoking behaviour would dominate over inborn susceptibility.
For these reasons, heritability is not a fixed characteristic of a given disease but an assessment of a given population, a set of measurements, and the extent to which variability in genetic and environmental exposure explains disease risk. Thus, there is no contradiction between a disease’s being highly heritable (in a given population) and yet having rates that vary dramatically across populations separated by time, geography, or socioeconomic status. In broad comparisons across groups, environmental exposure and methods of clinical ascertainment often vary substantially and contribute to secular changes in patterns of disease. Conversely, within a group exposed to a relatively uniform environment and studied in a standardized manner, genetic susceptibility may play a major role in determining individual risk.
Heritability expresses the inherited variation in rates of disease; heterozygosity expresses the rate of inherited variation in genome sequences. Heterozygosity is defined as the proportion of sites on the chromosome at which two randomly chosen copies differ in DNA sequence. Because cells are diploid (carry two copies of the genome sequence) and because these two copies were selected in a semirandom manner from the population, heterozygosity is equivalent to the fraction of base pairs that vary between the two copies each of us inherited from our mother and our father. That is, heterozygosity is the rate of genetic variation in the individual.
CHARACTERISTICS OF HUMAN GENOME SEQUENCE VARIATION
|Length of the human genome sequence (base pairs)||3,000,000,000|
|Number of human genes (estimated)||20,000|
|Fraction of base pairs that differ between the genome sequences of a human and a chimpanzee||1.3% (1 in 80)|
|Fraction of base pairs that vary between the genome sequences of any two humans||0.1% (1 in 1000)|
|Fraction of coding region base pairs that vary in a manner that substantially alters the sequence of the encoded protein||0.2% (1 in 5000)|
|Number of sequence variants present in each individual as heterozygous sites||3,000,000|
|Number of amino acid–altering variants present in each individual as heterozygous sites||12,000|
|Number of sequence variants in any given human population with frequency of >1%||10,000,000|
|Number of amino acid polymorphisms present in the human genome with a population frequency of >1%||75,000|
|Fraction of all human heterozygosity attributable to variants with a frequency of >1%||98%|
Single-nucleotide polymorphisms (SNPs) are sites at which a single letter in the DNA code has been swapped for a single alternative letter. Such variants are observed at approximately 1 in 1000 positions in the human genome sequence. In the protein-coding regions of genes, rates of genetic variation are lower—less than 1 in every 2000 bases; the rate of variation that substantially alters the sequence of the encoded protein is lower still. The lower rate of variation in coding regions is due to natural selection against alteration in the amino acid sequence of encoded proteins.
Our genomes also contain other types of sequence variation: insertions and deletions of nucleotides; alteration in the number of copies of particular genes and sequences; and larger-scale alterations, such as inversions and translocations. All types of DNA sequence change can influence gene function and contribute to disease.
The genetic variation in each individual is largely attributable to variants that are common. Empirically, more than 98% of the heterozygous sites in each individual display frequency of greater than 1% in the worldwide human population. During the last 15 years, a public database has been built that contains essentially all common sequence variants in the human population (with frequency >1%). At the time of this writing, this public database contains more than 44 million human genetic variants. Not all these entries represent common variants (some are rare), and a small fraction may represent technical false-positive findings.
The major contribution of common variation in human sequence diversity is explained by the unique demographic history of the human population. Despite the global distribution of the current human population, it is now clear that all humans are the descendants of a single population that lived in Africa only 10,000 to 40,000 years ago. The ancestral population was small (with an effective size of perhaps 10,000 individuals), lived a hunter-gatherer existence at low population densities (relative to other humans and later domesticated animals), and had evolved in Africa during millions of years. Most human genetic variation arose in this phase of human history, before the more recent migrations, expansions, and invention of technologies (e.g., farming) that resulted in widespread population of the globe. Most common human genetic variation predates the Diaspora and is shared by all populations on earth.
A second factor is the slow rate of change in human DNA. Mutation and recombination occur at very low rates, on the order of 10 −8 per base pair per generation; and yet, any pair of human genes traces a lineage back to a shared ancestor who lived on the order of 10 3 to 10 4 generations ago (if a generation is 20 years, then 10 4 generations is 200,000 years). In other words, considering the typical nucleotide in two unrelated humans, it is more likely that they trace back to a shared ancestor without any mutation having occurred than it is that a mutation has arisen in the intervening time. This explains why 99.9% of base pairs are identical when any two copies of the human genome are compared.
Another aspect of human variation is explained by these simple mathematical and population genetic relationships: the extent of human DNA sequence diversity attributable to rare and common variants. Each of us inherits from our parents some 3 million common polymorphisms (classically defined as those with frequency of >1%). We inherit genetic variants that are shared by apparently unrelated individuals but are at frequencies less than 1%, and we inherit thousands of variants that are limited only to the individual and the individual’s closest relatives.
The shared ancestry of human populations explains another aspect of human genetic variation: the correlations among nearby variants known as linkage disequilibrium. Empirically, individuals who carry a particular common variant at one site in the genome are observed to be more likely than chance to carry a particular set of variants at nearby positions along the chromosome. That is, not all combinations of nearby variants are observed in the population but rather only a small subset of the possible combinations. These correlations reflect the fact that most variants in our genomes arose once in human history (typically long ago) and did so on an arbitrary but unique copy carried by the individual in whom the mutation first arose. This unique ancestral copy can be recognized in the current population by the stretch of particular alleles (known as a haplotype). These ancestral haplotypes, passed down from shared prehistoric ancestors in Africa, offer a practical tool in association studies of human disease because it is not necessary to measure directly each nucleotide to capture much of the information.
The genetic architecture of a disease refers to the number and magnitude of genetic risk factors that exist in each patient and their frequencies and interactions in the population. Diseases can be due to a single gene in each family ( monogenic ) or to multiple genes (polygenic). It is easiest to identify genetic risk factors when only a single gene is involved and this gene has a large impact on disease in that family. In cases in which a single gene is necessary and sufficient to cause disease, the condition is termed a mendelian disorder because the disease tracks perfectly with a mutation (in the family) that obeys Mendel’s simple laws of inheritance.
Some single-gene disorders are caused by the same gene in all affected families; for example, cystic fibrosis is always caused by mutations in CFTR. Although many individuals with cystic fibrosis carry the same founder mutation (δ-508), others carry any pair of a wide variety of different mutations in CFTR . The existence of many different mutations at a given disease gene is known as allelic heterogeneity.
A mendelian disorder can be due to a single genetic lesion in any given family but in different families can be due to mutations in a variety of genes. This phenomenon, termed locus heterogeneity , is illustrated by retinitis pigmentosa. Although mutation in a single gene is typically necessary and sufficient to cause retinitis pigmentosa, there are dozens of different genes in which retinitis pigmentosa mutations have been found (Online Mendelian Inheritance in Man #268000). In each family, however, only one such gene is mutated to cause disease.
Most single-gene disorders are rare (present in <1% of the population) and are manifested early in life. Many are severe and cause death before reproduction in the absence of modern medical care. The fact that most monogenic disorders are severe in childhood and rare in the population is not a coincidence but reflects the impact of natural selection. The deleterious effect of these mutations results in a decrease in reproductive fitness (in individuals unlucky enough to inherit them), and the mutations and the disease are therefore unlikely to drift to high frequency in the population.
There are exceptions to this general idea: cases in which the mutation causing a severe monogenic disease (such as haemoglobin S, the cause of sickle cell anaemia) is common in populations. Such cases appear to be the result of a different kind of selection, known as balancing selection —situations in which a gene mutation is beneficial in one circumstance (a genotype or environment) but deleterious in another. Heterozygous carriers of haemoglobin S are relatively protected against malaria, and this benefit balances the deleterious effect of sickle cell disease in homozygotes.
Starting in the 1980s, the advent of genome-wide linkage analysis led to rapid success at identifying the specific genetic mutations that cause mendelian disorders, and now thousands of genes have been identified for clinically important conditions. Progress was sparked by the development of a suite of powerful research techniques— family-based linkage analysis followed by positional cloning —in which a genome-wide search is undertaken for the causal gene, which is first localized to a chromosomal region. (The initial idea of genetic linkage mapping traces to Sturtevant in fruit flies in 1913 but did not become practical in humans until the 1980s.)
Once the search discovered linkage between a chromosomal region and a disease, that chromosomal neighbourhood was scoured for the genetic culprit, which was recognized by the observation of mutations that altered the protein-coding sequence, enriched in cases of disease compared with unaffected relatives and population-based controls. The power of these approaches prompted and was fuelled by the Human Genome Project, which provided the foundation of information on DNA structure, sequence, and genetic variation required to undertake such searches.
More recently, it has become possible to search for the mutations underlying mendelian diseases by skipping the step of family-based linkage, instead sequencing the genome of the individual and searching for mutations that might explain the disease. If the gene is already known and the mutation easily interpreted (e.g., truncating the protein), this approach is highly efficient and successful. If the gene is rarely mutated and not yet known to cause the disease, or if the mutations are in noncoding regions, direct sequencing still runs up against the analytical and clinical challenge of genome interpretation.
Similar to mendelian disorders, most common diseases are influenced by inheritance. In contrast to mendelian disorders, the genetic contribution to common diseases is typically due to the action of many genes rather than a single gene in each family. Empirical evidence in favour of this model comes from classical family studies, which failed to observe classical mendelian ratios for common diseases. In the 1990s, the tools of family-based linkage analysis were applied to nearly all common disorders. Much of this work was done in isolated founder populations (such as Finland and Iceland) with the goal of simplifying the genetic architecture and accessing extended pedigrees. Excepting a few notable successes, these studies revealed few strong signals that localized the genes responsible for disease, indicating that few cases of common diseases are due to individual genes of large effect. If a single gene contained rare mutations of large effect that explained 20% or more of the inherited risk for type 2 diabetes, hypertension, or schizophrenia, it would long since have been found with linkage analysis.
The next shortcut to understanding the genetic determinants of common diseases was to identify and study rare families with early-onset forms of common diseases that clearly demonstrate mendelian patterns of inheritance. Important examples include the role of BRCA1 and BRCA2 in early-onset breast cancer, maturity-onset diabetes of the young as a form of type 2 diabetes, many monogenic disorders of blood pressure and electrolyte regulation, early-onset Alzheimer disease, and many others.
These successes provide diagnostic information for families burdened with severe, early-onset forms of disease and insight into the underlying pathways responsible for disease. For example, more than 20 genes have been identified that, when mutated, cause rare mendelian disorders of blood pressure and electrolyte regulation. So far, every one of these genes is active in the kidney, and most are involved in the renin-angiotensin-aldosterone pathway. This result is a compelling demonstration of the central importance of the kidney in human blood pressure regulation and has suggested new therapeutic targets of substantial promise.
It was hoped that the genes found to be responsible for early-onset, monogenic forms of common diseases would contribute to the more common forms of disease in the population. In this scenario, severe mutations might cause early-onset forms, and more prevalent but subtle alterations in the same genes might contribute to common forms of disease. A comprehensive test of this hypothesis awaited tools from the Human Genome Project and improved methods of genetic epidemiologic analysis.
Genome-wide association studies (GWAS) became possible in the mid-2000s on the basis of the sequencing of the human genome, cataloguing of common genetic variants, and high-throughput tools for measuring genetic variation. However, genetic association studies long predated genomic technologies and are simple in concept: the frequency of a common variant is measured in individuals with the disease of interest and compared with well-matched controls (drawn from the population at large or unaffected family members). Now, this process is routinely performed with hundreds of thousands or millions of genetic variants from the genome-wide collection.
Genetic association studies were pioneered in the context of the human leukocyte antigen (HLA) locus on chromosome 6. The HLA complex was discovered on the basis of its role in transplantation tolerance and is characterized by diverse allelic variation that can be measured by interactions of antibodies and antigens. By measurement of these protein-based (immunologic) readouts of the underlying genetic variation, HLA alleles were found to be a major determinant of susceptibility to infectious and autoimmune diseases. Starting in the 1960s, empirical data on human population genetics and genetic association studies were developed in the context of the HLA complex.
By the 1980s, tools of molecular biology made it possible to directly measure DNA variation (rather than using protein or phenotype measurements as surrogates for the underlying genetic variation), ushering in the modern era of human genetic research. In this pregenomic era, it was only practical to measure one or a small number of genetic variations in each study, limiting association studies to incomplete assessments of individual “candidate” genes selected on the basis of biologic criteria.
The study of candidate genes led to a modest number of robust and reproducible associations, such as the contribution of apolipoprotein E4 to Alzheimer disease; factor V Leiden to deep venous thrombosis; a 32-base deletion in the chemokine receptor CCR5 to HIV infection; common variants in the insulin gene to type 1 diabetes; and SNPs in the peroxisome proliferator–activated receptor γ and the β-cell potassium channel Kir6.2 to the risk for type 2 diabetes.
Early in the 2000s, comprehensive surveys of published genetic association studies showed that valid associations were few and far between, with many initial claims of association proving irreproducible, likely representing false-positive claims. One such analysis estimated that in the pre-GWAS era, only 10 to 20 bona fide associations had been documented of common genetic variants with common diseases.
A major reason for the state of this literature was the intrinsically low likelihood of finding a gene and variant contributing to any given disease. Each genome contains millions of genetic variants, and presumably only a small fraction of these influence disease. This is often described as a problem of “multiple hypothesis testing,” with the investigative community searching for associations between multiple genes, multiple variants in each gene, and multiple diseases. An alternative (bayesian) statistical framework frames this issue on the basis of low prior probabilities of association. Regardless, it is conceptually clear that much more stringent statistical thresholds (than the traditional P < .05) are required for declaring association of genetic variants and disease.
As in linkage analysis for mendelian traits, a key to success in association studies was the advent of genome-wide search, unbiased by prior hypotheses about biologic mechanisms. With the sequencing of the human genome, development of large-scale SNP databases, and tools for genotyping up to one million SNPs per individual, by 2005 it became practical to perform GWAS to identify genomic loci harbouring allelic variation. With a recognition that any given variant had a very low likelihood of truly being associated with disease, much more stringent statistical thresholds were deployed (typically requiring a P value of 10 −7 or lower to declare “genome-wide significance”).
Age-related macular degeneration (AMD) provided an early success of GWAS. AMD is a typical common, polygenic disease; siblings of affected patients are perhaps three to six times as likely as unrelated individuals to become afflicted, and yet family-based linkage analysis revealed only modestly significant (and modestly reproducible) linkage results. The pathophysiologic defects that underlie AMD were largely unknown until it was found that a common coding polymorphism in the gene for complement factor H is a major risk factor for AMD. The variant ( Y402H ) has a high population frequency (approximately 35% in European populations) and increases risk by 2.5- to 3-fold in heterozygotes and by 5- to 7-fold in homozygotes. Multiple other complement factors have since been found to harbour common genetic variation that influences the risk for AMD in a highly reproducible manner, providing unambiguous information about the primary role of complement in this common disease.
Since 2005, GWAS has been used to identify literally hundreds of novel genetic variants that show reproducible associations to a large variety of common human diseases. The field evolved a set of criteria and standards that largely eliminated the previous difficulties with irreproducible claims of association, making association studies a reliable method to identify genomic loci related to human diseases. The National Human Genome Research Institute of the National Institutes of Health maintains a catalogue of GWAS findings that, at the time of this writing, included 12,987 such associations across 1871 publications. This represents dramatic progress compared with the two dozen or so such findings known at the start of the decade.
The results of GWAS support a number of conclusions about the role of common genetic variants in common disease. First, nearly all diseases investigated by GWAS have yielded novel findings, in many cases yielding dozens to more than 100 independent common variants associated with risk of disease. Second, only a small fraction of these findings were previously known, confirming that an unbiased genetic mapping approach can provide new clues about the aetiology of common diseases. Third, most of the associations demonstrate modest odds ratios (on the order of 1.1-fold to 1.5-fold), indicating that the genetic nature of common disease is highly polygenic and that natural selection has likely purged alleles of large effect from the pool of common variants. Fourth, in only a few cases (perhaps 10%) does the associated haplotype carry a variant that alters protein structure; this suggests that much of the risk of common disease acts through effects on gene regulation rather than protein sequences. Fifth, in sum, the variants thus far identified explain only a modest fraction (ranging between 1% and 20%) of the estimated heritability of each disease, indicating that the rest of the inherited risk is due to some combination of common variants of more modest effect, rare variants not yet discovered, nonadditive interactions between genotypes and between genotype and the environment, or other (as yet unanticipated) factors.
Genome-wide approaches (not limited to candidate genes) can be thought of as testing the completeness of the sets of genes previously discovered for each disease by other approaches. For example, in the case of autoimmune diseases, many (perhaps half) of the findings from GWAS lie near a gene previously known to play a role in the immune system. Similarly, a substantial fraction of the genetic variants found to influence lipid levels lie near genes that were previously known to play a role in lipid biology (because they either carry rare mutations that contribute to mendelian forms of hyperlipidaemia or were discovered on the basis of laboratory studies). Examples such as autoimmune disease and lipids provide a reassuring alignment of mendelian genetics, biologic investigation, and the genes mapped by GWAS.
However, for the majority of diseases and of disease-associated genetic variants, the genomic regions showing association to disease are novel and do not contain any genes previously studied. One such case is type 2 diabetes, for which more than 80 independent genomic loci have been found to influence risk for disease, and yet only a handful were previously implicated by other methods. A second is myocardial infarction, for which perhaps one third of the SNPs lie near genes involved in low-density lipoprotein cholesterol, but the other two thirds do not contain any previously studied gene. These examples indicate that there are important gaps in our previous knowledge of pathophysiology and biologic mechanisms and that genome-wide approaches can point to high-priority candidates for study.
Although tantalizing, the results of GWAS have raised many more questions than they have answered. Each discovery implicates a particular genomic region, but it has proved challenging to establish which gene is responsible for the association. This is challenging in large part because so many of these common variants are noncoding, and methods to connect noncoding variation to the genes they regulate remain in their infancy. Even where novel genes are identified, much work is needed to discover their biologic and physiologic functions. Finally, GWAS findings explain only a fraction of the estimated heritability of most diseases, leaving open the question of which genes, and which types of variants and genetic effects, explain the remainder.
Although much of human genetic variation is due to common DNA variants (such as those tested through GWAS), each of us also inherits many thousands of variants that arose more recently and that tend to be lower in frequency and more population specific. To the extent that such variants have large effects on phenotype, they might have been previously identified on the basis of family-based linkage studies of mendelian disorders. However, there certainly exists a large universe of lower-frequency variations that have effects too modest to have been recognized and identified in family-based linkage analyses and are too rare to have been captured by the first generation of GWAS.
The study of lower-frequency and rare variants is now practical owing to advances in technology for DNA sequencing. With dramatic drops in price and increases in throughput, it is increasingly routine to sequence individual genomes in the context of medical research (and, in the future, clinical practice). Such an approach will provide a much more complete assessment of genetic variation than was previously obtainable and will incorporate common as well as rare variants—and points to the major challenge of genome interpretation.
For mendelian diseases, the sequencing of individual genomes has made it possible to bypass family-based linkage analysis and positional cloning and instead directly to sequence all protein-coding genes in the genome (so-called exome sequencing) among affected and unaffected individuals. Since 2009, the use of exome sequencing has led to the identification of numerous genes for mendelian disorders that had proved intractable with previous methods. For example, we studied a family in whom four siblings displayed extremely low blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and triglyceride levels—an apparently recessive disorder termed familial combined hypolipidemia. Previous linkage studies had identified a chromosomal region in which the causal gene lay, but because of the prohibitively large number of genes in the region, causal mutations had not been found. Exome sequencing of DNA samples from two of the siblings revealed only one gene, angiopoietin-like 3 (ANGPTL3), that harboured rare DNA variants in both alleles in both siblings. Subsequent studies confirmed the presence of additional ANGPTL3 mutations in unrelated individuals with the same disease.
For common diseases, elucidation of the role of low-frequency and rare variants is just beginning. At the time of this writing, initial genome sequencing studies of hundreds or a thousand cases of common diseases (compared with appropriate controls) have yielded few findings. This is likely due to some combination of (1) the causal rare variants being lower in frequency and more modest in effect size (that is, not deterministic) and thus requiring large samples to achieve statistical significance; (2) the current limitations in our ability to recognize functional mutations from the sea of benign DNA variants, which is needed to increase signal compared with noise; (3) the need for improved statistical methods for relating rare variants to disease; and (4) the natural selection during human evolution, which shaped the overall balance of rare and common variants that contribute to each disease.
Much has been written about the future use of genetic prediction in clinical medicine, but a sober appraisal requires consideration of the natural history of each disease, the available approaches for presymptomatic prevention, and the predictive nature of each test. Where genetic prediction is strong, disease outcomes are serious, and prevention exists, the combination can be of great clinical value. For example, in hemochromatosis, knowledge of genetic risk and measurement of iron stores allow presymptomatic phlebotomy, a safe and effective approach that reduces the development of end-organ damage and that would otherwise not be used. Similarly, testing for BRCA mutations in at-risk individuals provides valuable information about cancer risk, allowing women to choose between intensive monitoring and preventive surgery (mastectomy and oophorectomy) to reduce risk of cancer. What these examples share is that the disease is relatively rare, a robustly measured genetic risk factor dramatically increases risk, and an established prevention exists that otherwise (because of cost, convenience, or risk) would not be used.
For most common diseases, the role of genetic prediction remains unclear. This is because the disease is common, and genetic risk (as we understand it today) is probabilistic rather than deterministic in nature. Thus, the discrimination in risk due to genetics is much more limited. Moreover, in many cases, it is the characteristic of available interventions (rather than the genetic test per se) that limits utility. For example, some prevention strategies, such as diet and lifestyle modification for type 2 diabetes, are useful for everyone. In such settings, identification of a high-risk population is either of limited use or could even be counterproductive (if a focus on high-risk individuals ended up denying the rest of the population a worthwhile and safe prevention strategy). In other cases, we simply lack a proven preventive intervention, and thus risk estimation alone is not what limits progress. For example, the genetics of AMD has identified common variants with substantial effects on risk and a cumulative score of such variants that can stratify risk in the population by dozens-fold. However, at present, prevention for AMD involves smoking cessation, diet, and exercise, all of which are best deployed widely in the population rather than in a targeted manner.
To realize the value of genetic insights into disease, it will be necessary to develop new and more effective approaches to prevention that target causal mechanisms. One encouraging example involves the gene encoding proprotein convertase subtilisin/kexin type 9 ( PCSK9 ) and risk of myocardial infarction. Mutations in PCSK9 were first identified through genetic mapping studies of rare families with very high levels of low-density lipoprotein cholesterol. Soon, candidate gene association studies of PCSK9 revealed the existence of common variants that reduced or eliminated the function of the PCSK9 protein; in one study, 2.6% of African Americans carried nonsense mutations in PCSK9 . These “loss of function” variants in PCSK9 , being common, could be studied in large populations for impact on clinical phenotypes and were soon shown to reduce plasma low-density lipoprotein cholesterol and to protect against coronary heart disease. This indicated that reduction in PSCK9 function would be expected to reduce risk of myocardial infarction through its effects on low-density lipoprotein cholesterol. Moreover, a small number of people were found to be homozygous for these loss-of-function PCSK9 mutations and, despite lacking immunoreactive PCSK9 protein, to be healthy and well. This indicated that even complete reduction in risk of PSCK9 would likely be safe.
On the basis of these results, several companies have developed monoclonal antibody–based drugs targeting the PCSK9 protein. Preliminary data from clinical trials of these agents have demonstrated large reductions in blood low-density lipoprotein cholesterol levels, in some cases surpassing even the most potent statin drugs. A reduction in risk of myocardial infarction is predicted on the basis of the genetic data for loss-of-function PCSK9 mutations as well as the experience with other drugs that lower low-density lipoprotein cholesterol. However, definitive outcomes trials remain important and, at the time of this writing, have not yet been completed.
Inherited factors contribute substantially to common as well as to rare diseases. Mendelian disorders are typically caused by rare mutations in the protein-coding regions of genes. On the basis of the results of GWAS, it is clear that common variants play a role in common disease, with typically modest effects that often act through effects on gene regulation rather than on protein structure. Each person carries a deep reservoir of less common and rare genetic variations that will be tested in coming years for a role in disease. It seems reasonable to expect that in the coming decade, systematic and integrative analyses of millions of genome sequences will define lists of genes and variants (both common and rare) that contribute to each human disease. If this international effort incorporates large and epidemiologically valid samples and takes into account factors that could bias results, such as case ascertainment, it should provide reference information needed to annotate each individual genome sequence for disease risk.
However, success in identifying genes and mutations will prove of value only if it leads to improved prediction, diagnosis, understanding, and treatment. Biologic understanding requires bedside-to-bench research, in which genes found mutated in patients are studied in the laboratory. It will be necessary to place new genes into known (and as yet unrecognized) biologic pathways and to understand how dysfunction and dysregulation lead to disease. In some cases, such as the role of complement in AMD, initial answers may come quickly; in others, in which the relevant pathobiology is as yet unknown, the information to be gleaned from following these clues is unpredictable. In the fullness of time, genetic insights gleaned from patients should lead to a new generation of therapies that more directly target the underlying root causes of risk in the population.
New approaches to “precision” medicine will require not only development of predictive models and new therapies but also a foundation of clinical trial evidence that demonstrates benefit. That is, it is not sufficient simply to hypothesize that a genetic test or targeted therapy benefits patients, but it will be necessary to test this hypothesis in controlled trials. Such clinical trials will involve measuring DNA variation in study participants and testing approaches to intervention (prevention or treatment) based on such information. Genetic tests may prove predictive without being useful, and only careful research can demonstrate value and justify society’s investment in their use.
Whereas much remains uncertain, it is clear that genetic and genomic information is accumulating at a staggering rate and holds much potential as well as challenges for the future of medicine. Rather than leaping to deploy genomics in medicine before value has been shown, it is incumbent on us to carefully develop and critically evaluate the use of this new technology to inform and improve the understanding, prevention, and treatment of disease.