WHY CAN’T MOST MENDELIAN DISEASES BE SOLVED?
As noted above, in approximately 70%–75% of cases, the mutation(s) underlying a mystery genetic disease is or are not found. There are many likely reasons for this impasse. One important reason is that typically so many variants are present that it is difficult to narrow in on which one(s) could be causing the disease. By focusing on variants that are likely to be damaging based on their predicted effect (e.g., inactivation of a gene product), a smaller number of candidate disease-causing variants may be obtained, but the number is often still on the order of five to twenty mutations.
Analyzing larger numbers of affected and unaffected family members can be useful for trimming the list of candidate disease-causing variants further. In some cases, the clinical features of the genetic disease are suggestive of what underlying molecular mechanisms might be disrupted, and candidate disease-causing variants can be prioritized based on their known/predicted contribution to those mechanisms. Sometimes one of the candidate variants is in a gene that has been extensively studied in mice. If mutation of the mouse counterpart of the human gene results in traits similar to the genetic disease being studied, it suggests that that variant might be disease-causing. However, for patients with a small number of family members available for analysis, and for diseases with more general clinical features, such as developmental delay, it is often difficult to determine which mutation is causing the disease.
A powerful way to identify the likely causative mutation in a rare disease is to compare sequencing information from two or more children whose shared symptoms suggest that they have the same disease. If the children have candidate disease-causing variants in the same gene, there is a high probability that those variants are causative. These “recurrent” rare mutations are extremely helpful because the chances of two children with a very similar rare disease having rare damaging mutations in the same gene and those mutations not being disease-causing is very small. Even if the children do not have candidate disease-causing variants in the same gene, if they have variants in genes that have similar roles, then those variants are likely to be disease-causing. For example, many genes implicated in hearing loss contribute to inner ear structure and many genes implicated in cardiomyopathy contribute to the structure of cardiac muscle. If predicted deleterious mutations are found in genes that operate in the inner ear or in cardiomyocytes (which make up the heart muscle), respectively, these might be the likely culprit.
In an interesting example of the power of using recurrent mutations to determine the cause of a childhood disease, we sequenced the DNA of a child with developmental delays and other problems, and her two unaffected parents (Figure 1). The parents had been searching for years to find out what was wrong with their child. After narrowing the list to eight possible mutations, it was unclear which one caused the disease. A second family across the country reported their child with a similar disease and a damaging mutation in one of the eight possible genes (NGLY1). The match was made and the disease solved! This story demonstrates the contribution of dedicated family members and the importance of sharing information.
Figure 1. A picture of Grace Wilsey. After screening through the thousands of variants in her DNA, mutations in eight candidate genes were identified. The causative mutation and gene affected was determined once DNA of another child with similar characteristics was sequenced at Duke University and found to have a mutation in the same gene, NGLY1. Photo courtesy of Matt and Kristen Wilsey.
In addition to the complexity of sorting through many candidate gene variants, a mystery genetic disease may remain unsolved because no variants were found in the first place. Current sequencing technologies do not cover 100% of the genome. Moreover, different sequencing approaches have different limitations. Exome sequencing analyzes the protein-coding region of the genome, which represents 1%–2% of genomic DNA, hence variants that affect nonprotein coding DNA (e.g., DNA that functions in regulating the expression of a particular gene) are not detected. Also, copy number variations (i.e., whether extra copies of a gene are present) and other structural variations are more difficult to detect with exome sequencing; whole genome sequencing is better for detecting such variants. Exome sequencing, however, targets protein coding regions at much greater coverage than whole genome sequencing (i.e., each base is sequenced an average of more than 80 times for exome sequence vs. an average of 30 times for whole genome) thus, providing greater sensitivity for detecting variants within coding regions.