GENOMICS TECHNOLOGY IN CLINICAL DIAGNOSTICS
During the last two decades, and especially since the completion of the first human genome draft sequence in 2001,1,2 we have witnessed an unprecedented expansion of our molecular genetics capabilities for both discovery and diagnostics. While certain advances arose out of the demands of the Human Genome Project itself, others have only been made possible because of its successful outcome. With the initial sequence in hand, a series of ambitious projects has enabled the in-depth investigation of variation in the human genome (e.g., 1,000 Genomes Project),3 the identification of functional elements encoded within the human genome (Encyclopedia of DNA Elements—ENCODE—Project),4 and mapping of the genetic architectures of common cancers (e.g., The Cancer Genome Atlas— TCGA).5–8 Each of these projects represents much more than just an achievement or a milestone in human genetics. These human genome sequences have become one of the most important and frequently used tools available to human genetics researchers and diagnosticians. Many of the newer genomics technologies described in this chapter relied on genomics data for their creation, and others (particularly next-generation sequencing) are dependent on the fruits of the Human Genome Project to perform basic analyses.
New molecular biology techniques are always initially adopted by research laboratories, and it is in this setting that recent genomics technologies have so far been the most transformative. In many ways, research in genetics is almost unrecognizable compared with the state of the science even five or ten years ago. In the clinical setting, there is more at stake when it comes to replacing traditional, proven technologies with novel applications, so the pace of change is understandably more cautious. Still, the integration of next generation technologies into the clinical laboratory has begun in earnest, and it is only accelerating.
The number of applications of human genomic analysis in modern clinical practice is vast and is constantly expanding due to new discoveries and technological breakthroughs. However, clinical applications today generally focus on two main types of genomic variation:
• Inborn (or constitutional) variation: Constitutional genetic variants are present from the point of fertilization, and thus are found in the genome of every cell in the individual. As a result, essentially any cellular sample type may be acceptable for testing. Constitutional variants are commonly heterozygous or homozygous (i.e., present at allelic fractions of 50% or 100%, respectively), meaning that they are comparatively easy to detect and may be effectively assayed using lower sensitivity methods. Mitochondrial genetics are an obvious exception, as are sex chromosome genetics and other rare conditions such as chimerism.
• Somatic variation: Somatic DNA alterations occur in individual cells of the body through a variety of means, including DNA damage and replication errors as seen in both normal aging and neoplasia, as well as in normal cellular processes such as immune cell variable-diverse-joining gene segments (VDJ) recombination and somatic hypermutation. In somatic mutation testing, success depends on proper sample selection. For example, in cancer diagnostics, pathological analysis of each sample should be conducted prior to testing to determine whether a sufficient number/proportion of tumor cells are present that may harbor mutations, in order to avoid false negative results. Due to anticipated sample heterogeneity and the potential for mutations with low allelic percentage, high-sensitivity methods may be required.
The full length of the haploid human genome is approximately 3 billion base pairs. Remarkably, clinically relevant genetic anomalies can be of any size, from the smallest single base substitutions to macroscopic chromosomal defects, altered chromosomal numbers, and even altered copy number of the entire genome (e.g., triploidy or tetraploidy in partial hydatidiform molar pregnancies9). This size range of possible anomalies, covering nine orders of magnitude, is akin to the difference between the length of one of your fingernails and the circumference of the earth. This represents a remarkable challenge from the point of view of genetic analysis and is the reason why so many different genetic analysis technologies and strategies exist. Nearly every tool is best suited to interrogate anomalies of a certain size range, and it is critical to keep in mind the size scale of expected anomalies and their anticipated location when planning any genetic investigation or diagnostic. Figure 10.1 shows some of the most common genetic analysis technologies and the size scales to which they are best suited. This should serve as a useful reference for the following discussion.
Figure 10.1 Depicted are the major genomic analysis technologies discussed in this chapter with the approximate genomic size scales to which they are best suited for detection of anomalies. It should be noted that certain anomalies, such as balanced translocations, require separate consideration. For example, such translocations may be detected by cytogenics but not by CGH/SNP arrays.
A full discussion of all genetic and genomic analysis techniques is beyond the scope of any one chapter, so here we will focus on a sampling of the most common technologies in use or in development in the diagnostic setting. We will begin by covering more traditional methods of genetic analysis before moving to a discussion of today’s modern genomics tools.
TRADITIONAL GENETIC AND GENOMIC ANALYSIS TECHNIQUES
Though it is somewhat of an arbitrary distinction, let us state that small genetic anomalies are those that are less than approximately 1000 base pairs. These include mainly single base substitutions and small insertions, deletions, or insertion/deletions (with both loss and gain of DNA). There are numerous ways that these small anomalies can be detected (including newer genomics technologies), but traditionally the most common methods in both the research laboratory and clinic are based on polymerase chain reaction (PCR). Larger anomalies cover the remaining six orders of magnitude, and a variety of techniques is utilized in their detection.
DETECTION OF SMALL GENETIC VARIANTS
The invention of PCR in 1983 stands as one of the most significant developments in the history of biology, as it was the first technique to make targeted genetic analysis (including sequence analysis) practical and straightforward.10 Because of the size of the genome, bulk genomic DNA contains only a vanishingly small proportional amount of any individual sequence of interest. Before the introduction of PCR, this represented a nearly insurmountable “signal-to-noise” problem for the investigation of genomic loci. Essentially the only available direct method not based on fragment cloning was Southern blotting (described below).
Briefly, PCR requires the design of short oligonucleotide primers that flank a sequence of interest. Primers, free nucleotides, and polymerase enzyme are added to the target DNA. The target DNA strands are separated (denatured) by increasing the temperature, and upon cooling, the target sequences are bound by the primers, which are then extended by the polymerase to copy the template. The cycle can be repeated simply by repeating the temperature changes. Because amplification proceeds in an exponential fashion, 40 rounds of PCR theoretically can produce a 240 (trillion)–fold amplification from as little as one original template molecule. Thus, the procedure essentially transforms a dilute sample of heterogeneous genomic DNA into a concentrated clonal solution of amplicon copies of the desired target sequence. This amplified product is then highly amenable to a large variety of downstream analytics, including Sanger sequencing (Figure 10.2) to determine its sequence and identify mutations or variants. The progress of a PCR reaction itself can also be measured via fluorescent markers, from which data the amount of initial template in the reaction can be inferred. This application, quantitative PCR (qPCR) is used routinely in diagnostics laboratories for a variety of purposes, including viral load testing.
Figure 10.2 Sanger sequencing. Following PCR amplification, many methods are available to derive length and sequence information from a target amplicon. One common and important method is Sanger sequencing. A sequencing primer binds the amplified template DNA and is extended by a polymerase in the presence of fluorescent dead-end (terminator) nucleotides, producing labeled fragments of different size. These are separated by capillary electrophoresis to produce characteristic sequencing plots, from which the DNA sequence may be read.
Although it is possible to perform PCR on DNA segments longer than a few thousand bp, the technical challenges increase beyond this point and ultimately require specialized polymerases and protocol modifications to prevent the amplification of non-specific products.11 Clinical procedures seldom include amplicons greater than 1kb. Today, PCR is used in too many clinical diagnostic applications to easily count. Many are described in other chapters, including such applications as:
• Assaying inborn disease mutations
• Identifying cancer mutations and cancer-related translocations
• Identity testing, for both patient and forensic samples
• Identifying/quantifying viral and bacterial genomes for infectious disease diagnostics
• Analyzing and quantifying mRNA expression
DETECTION OF LARGER SCALE ANOMALIES
There is a variety of traditional approaches for the analysis of larger genomic anomalies, the selection of which again depends critically the exact size scale of the expected anomaly, the predictability of its location, as well as the particular application or question. When looking at these techniques, there is a general trend from molecular biology–based approaches at the smaller orders of magnitude, moving ultimately towards straightforward microscopy as anomalies become large enough to detect with the assisted eye.
For the largest scale anomalies, classical cytogenetics is a powerful traditional approach for carrying out a genome-wide scan at a relatively low cost.12 It requires first culturing cells and arresting them in metaphase (when chromosomes are condensed after replication) using a mitotic inhibitor such as colcemid, which blocks the microtubule polymerization necessary for forming the mitotic spindle apparatus. Cells are then exposed to hypotonic solution causing them to swell, and dropped onto glass slides, where they burst and locally scatter their chromosomes into “spreads.” Chromosomes may be counted at this stage, but typically they are stained via any of a number of different techniques, which serve to reveal structural details and allow individual identification.13 Giemsa banding (G-banding) is the most common method, which produces characteristic chromosomal bands.14 By this method, late-replicating, transcriptionally quiet, and A/T-rich DNA staining is more intense (G-positive) and early-replicating, transcriptionally active, and relatively G/C-rich DNA is stained more lightly (G-negative) (Figure 10.3). The highest quality G-banded preparations are able to yield up to 850 bands across the genome.15 Microscopic analysis of banded chromosomes assists with their individual identification and can reveal loss or gain of material down to a few million base pairs (megabases or Mb) in size, though exactly how small depends on the location of the defect and the resolution of the bands. It can also reveal events such as translocations and inversions, even when no net gain or loss of material occurs (compare with comparative genomic hybridization [CGH] and single nucleotide polymorphisms [SNP] arrays, see below). Other banding methods are available that can reveal complementary information about the structure and organization of the genome at the chromosomal level. One particular method, Q-banding, involves a fluorescent dye such as quinacrine, DAPI, or Hoechst 33258, that binds preferentially to A/T-rich sequences, producing a banding pattern comparable to G-banding that can be used during fluorescence in situ hybridization (FISH) experiments (see below).16
Figure 10.3 Conventional cytogenetic analysis by G-banding. Depicted is a chromosomal spread from a two-year-old female with developmental delay, short stature, and microcephaly. Cytogenetic analysis revealed a translocation between chromosomes 7 and 13 (46XX, t(7:13)(q21.2;q12.3)), as shown by the two arrows on the karyotype. A deletion of chromosomal material at the breakpoint region might be expected based on the clinical scenario, yet the translocation appears to be balanced. However, such is the sensitivity of cytogenetics that even megabase-scale anomalies may not be readily detectable. (Courtesy of Sian Morgan, Cytogenetics Laboratory, Institute of Medical Genetics, Cardiff, UK)
Though developed many decades ago, cytogenetics is still widely used in clinical practice today for both constitutional and oncology diagnostics. It is a first-line test for children with developmental abnormalities and for fetal samples obtained via amniocentesis or chorionic villus sampling, when there is reason to suspect a structural or numerical chromosomal abnormality.17,18 There is also a long and rich history of cytogenetic analysis of hematological malignancies, going back to the discovery of the Philadelphia chromosome in chronic myelogenous leukemia in 1960 (the first observation of a recurrent genetic anomaly in cancer),19 followed soon after by the elucidation of the t(9;22) translocation from which it arises.20 Today, cytogenetics is still heavily relied upon in the diagnostic work-up of these diseases, particularly the leukemias and myelodysplastic syndromes (MDS). There are many cytogenetic signatures for these diseases, including some that are disease- defining or major diagnostic criteria, and others that provide therapy-related or prognostic information.21,22,23 For instance, the 5q minus syndrome is a specific subtype of MDS showing deletions of the long arm of chromosome 5 (often involving 5q31-5q32) that is associated with an overall favorable prognosis and a high likelihood of response to lenalidomide therapy.24
In contrast to the case in heme malignancies, there is essentially no routine clinical utility of cytogenetics for solid tumors. Solid tumors tend to have many more cytogenetic anomalies, and chromosomes prepared from these tumors are generally more condensed and thus have lower banding resolution, interfering with interpretation. Though modern genomics technologies are now beginning to be applied to these tumors, diagnostic analyses for many large rearrangements have typically been performed by FISH.
Fluorescence In Situ Hybridization (FISH)
In situ hybridization techniques, in particular FISH, have allowed study of the structure of the genome at a level of detail greater than that seen by conventional banding techniques.25,26,27 However, unlike cytogenetics, FISH is a targeted assay requiring foreknowledge of the expected genetic lesion. The method depends on the specific hybridization of a probe DNA sequence to its complementary sequence in the genome. Labeling the probe with a fluorescent dye allows its location to be revealed by fluorescence microscopy. The probes used in FISH experiments are typically derived from human sequences cloned into bacterial artificial chromosomes (BACs), with sizes of approximately 100kb. The creation of an extensive human BAC library (containing approximately 32,000 BACs tiled across the genome) was actually the first critical step of the Human Genome Project itself, effectively breaking the genome into “bite-sized” pieces that could be individually sequenced, with subsequent assembly to create the final sequence.28 This library now serves as a main source of DNA for FISH probes, thus there are only very few regions of the genome not amenable to FISH experiments.
FISH can be performed either on metaphase chromosome spreads or on preparations of cells in interphase.29 When applied to metaphase spreads, specific signal from sites of probe binding can be evaluated in the context of Q-banding data. Thus, metaphase FISH allows the counting of probe binding sites, determination of the identity of chromosomes showing probe staining, as well as the sub-chromosomal location of binding events (Figure 10.4). This is a wealth of useful information, but as with cytogenetics, this technique relies on the growth of cells in culture, which is expensive and can take days to weeks. Not all specimen types may grow reliably, and in certain scenarios, this process cannot be completed in a clinically relevant time frame. For example, in cases of suspected acute promyelocytic leukemia, rapid identification of the t(15;17) translocation producing the PML-RARa fusion oncogene is important in order to inform decisions regarding treatment with all trans retinoic acid (ATRA).30 In practice, this is typically performed either via reverse transcription of RNA to DNA followed by PCR (RT-PCR) or via interphase FISH.
Figure 10.4 Metaphase FISH of a patient with DiGeorge syndrome (22q11.2 deletion syndrome). Two differently colored probes are applied: a control probe targeted to the distal portion of the long arm of chromosome 22 (green) and a test probe targeted to the DiGeorge region (red). Two copies of chromosome 22 are present, but only one contains the sequence matching the DiGeorge test probe, indicating the presence of a deletion on the other chromosome. (Courtesy of Dr. Peter Thompson, Cytogenetics Laboratory, Institute of Medical Genetics, Cardiff, UK)
Interphase FISH can be readily performed on essentially any sample that contains nucleated cells, with no requirement for culture. This includes smears of any cellular bodily fluid, tissue touch preparations and frozen sections, and even formalin-fixed, paraffin embedded (FFPE) tissue sections. Aside from some differences in the preparation of cells and target DNA, the underlying concept is the same as for metaphase FISH. The difference lies in the state of the chromatin of non-metaphase cells, which is uncondensed and dispersed throughout the nucleus. However, though the probe lengths may be on the order of 100kb, this is short enough to produce staining in discrete spots within the nucleus. Of course, by this technique, no contextual chromosomal information is produced. This is a limiting factor, yet the process still has many important applications:
• Chromosome counting: Individual chromosomes can be counted using probes targeting specific chromosomes, though with the caveat that structural data about the intactness of the entire chromosome will not be available. A common example is rapid aneuploidy screening on amniocentesis samples using centromeric probes for chromosomes 13, 18, 21, X, and Y (Figure 10.5a).
Figure 10.5 Interphase FISH. A) For rapid prenatal diagnosis of common aneuploidies, FISH can be performed on cells in interphase obtained at amniocentesis. In this example, staining with a probe for the centromeric region of chromosome 21 is notable for three signals in every cell, which is diagnostic for Down syndrome. Unlike conventional cytogenetic analysis, this technique only indicates the number of copies of the probe region, which does not necessarily equate to the number of copies of whole chromosomes. B) Translocations and other rearrangements with specific breakpoints can be detected using multicolor “break-apart” FISH. Shown are interphase cells from a 36-year-old patient with newly diagnosed acute myeloid leukemia (AML). The cells are stained with two probes that target the CBFB gene: a red probe that binds just upstream (5′) of the gene, and a green probe that binds just downstream (3′). In a normal cell (left), the probes produce essentially overlapping spots, which can appear orange. In a cell harboring an inversion of chromosome 16 (right) which produces the CBFB- MYH11 fusion gene, one copy of the gene shows physical separation of the probe spots, indicating a chromosomal breakage between the probe binding sites. (Courtesy of Dr. Peter Thompson, Cytogenetics Laboratory, Institute of Medical Genetics, Cardiff, UK) (Courtesy of Dr. Gordana Raca, Cancer Cytogenetics Laboratory, Department of Medicine, University of Chicago, Illinois)
• Deletion/duplication analysis: Interphase FISH allows the counting of gene or target region dosage; for example, using probes to look for Her2 gene amplifications in breast cancer that predict a favorable response to trastuzumab therapy.31
• Translocation analysis: Translocations and large inversions can be detected with high sensitivity using a common technique called “break apart FISH,” which uses two differently colored probes targeted immediately upstream and downstream of a gene of interest. The two probes produce essentially overlapping spots in the normal state, but show physical separation if a copy of the gene has been involved in a translocation event (Figure 10.5b).
Multicolor FISH and Spectral Karyotyping
In many types of cancer, large numbers of cytogenetic alterations (translocations, etc.) can frequently interfere with definitive chromosomal identification by classical cytogenetics. Marker chromosomes (the term for those that are unidentifiable) can be very complex assemblies of multiple chromosomal parts. Technically, multiple individual FISH experiments could be performed to attempt to identify component parts of marker chromosomes, but in practice this may be infeasible. Another approach is to perform FISH using probes generated from individual whole chromosomes (chromosome painting), which would light up, not only all of the target chromosome, but also any part of a marker chromosome derived from it. If all 24 different chromosome paints are applied, the technique becomes even more sophisticated and is known as multicolor FISH (M-FISH), which was further developed as spectral karyotyping (SKY).32,33 This technique, though mostly used in the research setting, produces wonderfully detailed images of chromosome spreads (Figure 10.6), and can help resolve even some of the most complex marker chromosome rearrangements.34,35
Figure 10.6 FISH probes covering entire chromosomes can be tagged with chromosome-specific fluorescent dye signatures and hybridized to metaphase spreads to produce M-FISH or spectral karyotyping (SKY) images. The example shown is an analysis of the colon cancer cell line SW480. This allows for the in-depth characterization of complex “marker” chromosomes, including one that contains material from chromosomes 3, 10, and 12 (seen in the chromosome 10 box). This type of determination would be impossible with conventional cytogenetic banding techniques. (Courtesy of George Poulogiannis, Department of Pathology, University of Cambridge, England)
Southern blotting involves restriction digestion of genomic DNA into reliable fragments, size separation by gel electrophoresis, and transfer to a membrane. The membrane is then hybridized with a specific probe to detect a particular genomic fragment, with the probe labeled with a radioactive isotope or other similar system to produce sufficient signal amplification. Labeled bands are then analyzed to determine if they are of the expected size.36 Southern blotting can show insertions or deletions if they are large enough to affect the migration of the probed band, but very small-scale variants may only be identified if they destroy or create a restriction enzyme site, producing a new fragment size (i.e., a restriction fragment length polymorphism, or RFLP). There is significant labor associated with Southern blotting, and for many applications it is of only limited utility. Today it is used clinically for a few anomalies too large for PCR and too small for FISH. For example, in particular tri-nucleotide repeat disorders (e.g., CGG repeats in the FMR1 gene in fragile X syndrome), the repeat stretches can grow longer than 1000 bp, leading to failed amplification and false-negative PCR results. In these cases, Southern blotting is often necessary to ensure the detection of long repeats.37
Multiple Ligation-Dependent Probe Amplification (MLPA)
MLPA is a recently developed technique that enables low cost, targeted copy number analysis.38 Each MLPA reaction relies on two oligonucleotide probes designed to hybridize side-by-side on the target. Probes that hybridize successfully are enzymatically ligated together, thus the amount of ligated product is proportional to the amount of template in the sample. Subsequently, successfully ligated probes can be amplified by PCR to enable quantification. Use of tagged sequences at the ends of the probes allows the amplification of many probe-sets using a single pair of primers, supporting greater multiplexing than does multiplex PCR (which uses many different primer pairs). Other similar assays exist, such as the molecular inversion probe (MIP) method, which relies on enzymatic extension/ligation to circularize (and thus protect from degradation) a specially designed probe upon binding to the appropriate target sequence.39 However, MLPA is the most frequently used in the clinical setting. Common applications include testing for deletions in cancer genes and Mendelian disease genes.40,41 Because the assay essentially provides pinpoint analysis, it can be designed with closely spaced probes to detect small deletions (e.g., one probe-set per exon of a gene), or with probes more spread out to infer the presence of larger chromosomal deletions/duplications. In this way it behaves like a small microarray (described below).
NEWER GENOMIC ANALYSIS PLATFORMS
Over the last two decades, a host of newer genomic analysis platforms has been developed that allows for the simultaneous, parallelized interrogation of millions or billions of targets genome-wide. In many ways it is appropriate to think of these as “digital” versions of older “analogue” analyses: for example, in comparison to the visual data produced by cytogenetics and FISH, microarray or next-generation sequencing (NGS) technologies yield discrete results across quantized locations, and can be thought of as having resolution and amplitude range much like a digital image. This analogy is fitting because these systems produce so much digital data that whole fields of expertise in programming and computation have grown up around them, collectively termed genome bioinformatics. With this technology now migrating in wholesale fashion into clinical medicine, the potential ramifications for patient treatment and outcomes, care practices, and health records are enormous.
COMPARATIVE GENOMIC HYBRIDIZATION ARRAYS
Comparative genomic hybridization is what its name suggests: instead of preparing chromosomes and staining them with probes as in FISH, the genomic DNA itself is labeled and hybridized onto a solid surface dotted with individual probes to which it can hybridize.42,43 Each probe spot contains DNA from a specific genomic locus, such as individual BAC clones, spaced across the genome. In traditional CGH, the genomic DNA to be tested is labeled with a fluorescent dye of one color, and a second “normal” reference sample is labeled with a different color. Upon hybridization to the array, the relative binding by the two different samples is assessed at each probe location. At a given locus, if the test sample has more or less DNA than the reference (due to either amplification or deletion), the discrepancy will be revealed by a proportional color imbalance at that probe location. The result is a genome-wide assessment of local copy number, with the resolution determined by the probe density (Figure 10.7). It should be noted that array- based techniques such as CGH can only detect unbalanced chromosomal rearrangements where material is lost or gained. Balanced inversions and translocations are not detectable because the total amount of DNA at each locus remains unchanged.
Figure 10.7 Low-density CGH array analysis of the two-year-old patient discussed in Figure 10.3. Patient DNA (dark) and control DNA (grey) are labeled and hybridized together onto an array spotted with BAC clones. Chromosomes 7 and 13 are shown (the chromosomes involved in the translocation). Most array positions show equivalent binding (normal). However, at the positions corresponding to the breakpoint regions of both chromosomes are regions of reduced patient signal, indicating loss of chromosomal material. Thus, despite its appearance by conventional cytogenetics, the translocation is not balanced. It can be seen from the result that approximately 0.2 Mb of chromosome 13 and 8.42 Mb of chromosome 7 are lost, which is consistent with the patient’s presentation. (Courtesy of Sian Morgan, Cytogenetics Laboratory, Institute of Medical Genetics, Cardiff, UK)
SNP CYTOGENOMICS ARRAYS
In recent years, BAC-based CGH arrays have largely been replaced by newer array systems that use short oligonucleotide probes, which are easier to mass produce and can be made with higher probe density. Array probe counts have reached the millions, offering as low as 1kb resolution. Additionally, many platforms are now based either entirely or in part on probes that interrogate genomic SNPs.44 One complicating factor when discussing more recent genomics technologies is that, for each technology, platforms may be available from a variety of different companies, each with different chemistries and workflows designed to avoid intellectual property entanglements. Such is the case with SNP arrays as well as next-generation sequencing platforms (discussed below), among others. However, suffice it to say that each of the major SNP array platforms can monitor copy number at each probed locus based on binding affinity, while establishing the zygosity of the SNP call at each location. Thus, they offer the same type of locus- specific dosage data as CGH arrays, but provide the added benefit of up to millions of SNP genotyping calls genome-wide, which is valuable for a number of different applications.
Such arrays have been the workhorses of genome-wide association studies (GWAS), which are efforts to uncover genetic underpinnings of complex, multifactorial human diseases or phenotypes. Assaying for many SNPs across the genome of many affected and normal individuals allows investigators to statistically link the phenotype with particular SNP markers that may lie in close proximity to the responsible genetic factor. This strategy takes advantage of the concept of linkage disequilibrium (LD), whereby closely adjacent genetic markers will tend to co-segregate in families rather than being inherited independently, as would distantly separated loci; for example, those on separate chromosomes. These types of studies have been used to identify SNPs that predispose to some amount of elevated risk for a variety of conditions (macular degeneration, heart disease, diabetes, etc.).45,46 Though there are disagreements regarding the clinical utility of this information and the applicability of the data across ethnic groups, certain companies now offer clinical testing using these array platforms, providing individualized risk assessments based on the results provided by GWA studies. We stress that these results should be considered and interpreted cautiously, and only under the guidance of appropriate experts trained in clinical genetics and genetic counseling.
For traditional diagnostic testing, however, these arrays are proving valuable as a means of surveying the whole genome for deletions, duplications, and other anomalies that are too small to be visible by cytogenetics (Figure 10.8). While the absence of signal from one individual probe spot may represent noise, a cluster of absent signals can provide statistical confidence of a local genomic deletion. Thus, depending on overall and local probe density, these arrays can confidently call deletions approximately 50kb or less, up to 100 times smaller than those visible by cytogenetics. The SNP genotyping data can add confidence to such copy number identification, because a heterozygous deletion will show all “homozygous” SNP calls in that region. Additionally, SNP results can detect other copy-neutral anomalies such as uniparental disomy (which shows a normal number of DNA copies but the absence of heterozygous SNP calls) or mosaicism (which also has normal copy number but can show a variety of bizarre SNP allelic ratio patterns).47 Today, these arrays are widely used in the clinical setting for children with dysmorphic features and other developmental abnormalities, particularly when cytogenetics results are normal and when other targeted testing options (using FISH or PCR, e.g.) have been exhausted.48
Figure 10.8 SNP array analysis of copy number and genotype. Newer cytogenomics array platforms may contain millions of individual probes, and can assay the genome at a resolution above one probe per 1000 bp. Many commercial platforms are available, and many options exist with respect to array composition. Unlike traditional CGH, most modern arrays do not rely on comparative hybridization against a control sample. Instead, only the test sample is applied to the array, and the data are informatically compared against historical normal samples to derive copy number and genotype data. Arrays can be used to capture genotype calls genome-wide for GWA studies, etc., or viewed in plots such as the one shown to evaluate for chromosomal abnormalities. In this example, tumor DNA from a pediatric patient with neuroblastoma is analyzed. Only data from chromosome 11 are shown. At the end of the p-arm (left), the copy number assessment is normal, as is the genotype pattern (three signal ratios indicating AA, AB, and BB genotypes). Around the centromere (middle), the copy number is elevated, indicating duplication. The genotype pattern is complicated by the addition of an extra copy of this chromosomal segment. On the right, a deletion of most of the q-arm can be seen. The copy number plot shows decreased signal, and the associated genotype plot is consistent with reduced copy number, with each SNP positive for only the A or B genotype. (Courtesy of Dr. Gordana Raca, Cancer Cytogenetics Laboratory, Department of Medicine, University of Chicago, Illinois)
Perhaps the most consequential development in biology since the introduction of PCR is the advent of next-generation sequencing (NGS).49 NGS represents a truly transformational technological shift, because it offers for the first time the prospect of fast and inexpensive full genomic sequence analysis. The implications for research have been enormous, not just in clinical research but in virtually every field of biomedicine and basic biology. As of this writing, the full genome sequences of hundreds of animal species have been completed (and much larger numbers of bacterial and viral genomes), each offering new possibilities for discoveries within and across species.50 The technique can be used to probe any aspect of biology related to nucleic acids, from chromatin structure, genetics, and epigenetics, to transcription, RNA processing, and more, and can be applied to any species, whether well-characterized or novel.51
One of the reasons NGS is so exciting is that it is the first technology with the prospect of detecting genetic variation of every possible size scale. As with array platforms, a variety of NGS platforms exist that share many of the same important underlying features. As a group, these technologies circumvent the signal-to-noise problem of genomic DNA (or any other heterogeneous DNA sample type) in a fundamentally different way than PCR: by performing independent analyses of many individual DNA molecules from a large pool in a massively parallel manner. This is made possible by a critical first step whereby molecules are spatially separated from each other, essentially transforming a complex pool into a collection of isolated single molecules. The details of the separation are not critical for this discussion, but (for example) may involve separating DNA molecules into individual microbubbles (emulsion PCR) or spreading fragments across a surface covered with a “lawn” of complementary oligonucleotides. In order to produce enough signal from each molecule, local PCR amplification is performed, creating a population of local amplified clusters. Once the individual clonal groups are generated, enormous numbers of individual starting DNA molecules can be sequenced simultaneously, with separate data produced for each.52
Nearly any sample of DNA is compatible with NGS analysis. The DNA does not even have to be particularly intact, because NGS systems require that input DNA take the form of small fragments, typically a few hundred base pairs or less. Typically, this is achieved by ultrasonic fragmentation, or alternative methods based on multiplex PCR, etc. Fragments also typically require that specific oligonucleotide adapter sequences be incorporated onto their ends to make them compatible with the sequencer. This process is called library preparation, and only sequencer-compatible libraries may be applied to the instruments. After focal amplification of individual input library DNA molecules, sequencing primers bind to the adapter sequences, and sequencing proceeds inwards into the cloned fragment. Some platforms produce only one sequence read (single-end sequencing), while others allow for a second read from the opposite end of each fragment (paired-end sequencing). The details of the sequencing reactions are also platform-specific.
Because of the flexibility of NGS, a myriad of different wet-lab library preparation techniques have been developed to focus sequencing efforts on different aspects of genomic biology. Among other applications, ingenious methods have been developed to elucidate the complex domain structure of unwound chromatin in interphase nuclei, to reveal genome-wide patterns of transcription factor binding and epigenetic modifications, and to study the spectrum of RNA/RNA binding protein interactions.53–57 However, a full description of these techniques is beyond the scope of this chapter. From the standpoint of clinical medicine, today, most assay designs are geared towards identification of patient-specific genetic variants, either constitutional or somatic, via either a whole-genome or (most frequently) a targeted sequencing approach. As with any other diagnostic technique, careful planning is required to ensure that an NGS assay will have the power to detect the full spectrum of sizes and locations of desired genetic features.
Despite its bioinformatic complexity, whole genome sequencing (WGS) is perhaps the simplest sequencing application to perform, and the creation of a whole genome library is actually a first step for a variety of applications. As described earlier, after the basic steps of fragmentation and adaptor ligation, genomic DNA fragments are essentially ready for sequencing. The newest NGS platforms enable the complete sequencing of an entire human genome in only one day, at a cost of a few thousand dollars, a remarkable advance reducing cost by several orders of magnitude over just the past decade. WGS has been an invaluable research tool and is beginning to have a clinical impact.58 As costs continue to decrease, it should see expanded or even routine clinical use.
However, because of the current expense associated with WGS, various targeted sequencing approaches are gaining traction in research and clinical laboratories. These utilize a variety of wet-lab chemistry approaches to create sequencing libraries enriched for DNA sequences of interest. Targeted sequencing can be readily performed, either by fishing out fragments of interest from a whole genome library using capture hybridization, or by preparing targeted sequencing libraries directly from genomic DNA via multiplex PCR or other methods (Figure 10.9). Library preparation strategies can be devised to address essentially any question related to nucleic acid biology, giving laboratories a great deal of freedom to design assays in order to meet the needs of their clinician and patient populations.
Figure 10.9 NGS library preparation. There are many ways to produce sequencer-compatible libraries from genomic DNA. The choice of method is heavily dependent on the types and sizes of genomic features being investigated, as well as other factors such as cost, sequencing platform, etc. Typically, genomic DNA sequencing proceeds along a few lines. For WGS and some targeted sequencing approaches, DNA is first sheared by ultrasonic fragmentation into small pieces (<1kb). After enzymatic end-repair, adapter sequences are ligated to the ends to produce a sequencer-ready whole-genome library. If this library is applied to the sequencer, the result will be WGS. If instead certain sequences of interest are purged from this library prior to sequencing, the sequencing will be targeted (i.e., either exome or other targeted capture sequencing). Alternatively, genomic DNA may be processed in a variety of different ways to produce targeted libraries. For example, targeted libraries can be produced via multiplex PCR or restriction digest–based methods to produce short fragment libraries with appropriate barcoded adaptors. Every targeted method can be easily customized to concentrate the sequencing effort on the genomic territory of interest.
One important targeted sequencing approach is whole-exome sequencing (WES), in which capture baits are used to select protein-coding sequence fragments from whole-genome libraries.59 The resulting sequence data are heavily weighted towards this exonic coding sequence, which represents only about 2% of the genome, and data from these loci can be attained at a fraction of the cost of whole-genome sequencing. It has been estimated that as much as 80% of Mendelian disease can be explained by mutations in protein- coding DNA. This makes for a very favorable cost–benefit calculation for WES and explains why it is growing in popularity as a diagnostic choice for patients with unexplained diseases of presumably genetic origin.59,60,61
In preparing samples for NGS, libraries from multiple samples can be mixed together and run in a highly multiplexed fashion on a single instrument run, if each individual library is appropriately “barcoded.” This is achieved by using slightly different adaptor molecules for each patient that contain individualized sequences. Upon ligation, every DNA molecule from a patient is then tagged with its own unique patient-specific sequence, and each DNA molecule is then sequenced together with its barcode. After sequencing, each patient’s data can then be extracted from the pooled data on the basis of these barcode sequences. This is a tremendously valuable feature that is used frequently in the clinical laboratory to provide targeted data on many patients at once, thereby reducing per-patient sequencing costs.
The eventual output of NGS is a simple list of the sequences (reads) of all of the clonal clusters that were produced from the submitted sequencing library. Each base that is read is also paired with a quality score, indicating the instrument’s statistical confidence in that particular base assignment. Paired- end reads, where two opposite ends of the same DNA molecule are sequenced, share linked identifiers. Despite the simplicity of the data type, analysis of NGS data can be significantly challenging due in large part to the sheer volume of sequences produced. Today’s most powerful sequencing instruments can process over a half a billion individual clonal clusters in one day, yielding approximately 150 billion bases (gigabases or Gb) of data with paired-end 150 bp sequencing reads.
For most NGS applications, the data must first be put into context in order to derive meaning. A single fragment sequence by itself is not particularly informative: it can be individually compared to the human reference sequence, but any discrepancy could be the result of either a true genetic variant or a false sequencing error. Conversely, a normal read may reflect only one normal allele and reveal nothing about a second variant allele. Therefore, almost universally, the first step in NGS bioinformatics analysis is alignment, a computational process by which all of the collected reads are compared with the reference genome sequence and mapped to the most likely correct position. Many software applications (both publicly available “freeware” and commercial software) are available to perform this function, though it should be noted that this process can only be readily performed with an appropriate reference genome sequence to which reads can be mapped.62,63 Once the data are aligned, sequence variants can be identified by comparing the reads at each base position against the reference genome sequence (Figure 10.10). The greater the number of reads covering a position (sequencing depth), the greater their statistical power for detecting anomalies. Likewise, increased depth also minimizes the likelihood that rare random errors will be interpreted as variant sequences. For constitutional whole- genome sequencing, an average of 30x depth is the currently accepted standard. However, if expected variants are rarer than, for example, a SNP at 50% allelic frequency, greater depth is required to produce the same degree of confidence. Suppose for example that only one out of 100 fragments of DNA corresponding to codon 12 of the KRAS gene was mutated in a particular tumor sample: with only 30 samplings, we would be unlikely to detect a single variant molecule! This is a critical factor in cancer analysis, because tumor heterogeneity and interference by normal cells can very easily lead to low mutational allelic percentage in cancer specimens, requiring much higher read depth to attain adequate sensitivity (Figure 10.10).64
Figure 10.10 Increased NGS read depth is needed to call low allelic percentage mutations with high confidence. In cancer specimens, which often show significant sub-clonality and admixture with normal cells, clinically relevant mutations can be present at very low mutant allelic frequencies (MAF). This may be true in other clinical scenarios as well (e.g., mitochondrial disease). In the cancer setting, it is clinically desirable to be able to reliably detect mutations at 5% MAF. Here, samples with MAF between 5–10% required at least 250x read depth for high-sensitivity detection (compare with 30x read depth, which is the gold standard for typical inherited genetics). Below 5% MAF, extremely high read depths are required. Specificity suffers as well at very low MAF, because it becomes difficult to discriminate very low percentage mutations from very low percentage sequencing errors. (Courtesy of Foundation Medicine, Cambridge, Massachusetts)80
Larger insertions/deletions and translocations require special informatics approaches, because reads spanning a breakage or translocation point may not “map” effectively. Mapping is based on the degree of unique matching between a read and the reference sequence. Alignment software is forgiving to a point, but a read crossing a translocation point, for example, is likely to fail to map under ordinary conditions. Care must be taken to assess and perhaps reprocess the alignment with these variants in mind, including reevaluation of sequencing depth and mapping of reads and read pairs.65,66 An important weakness of current NGS technologies is the production of only short sequence reads. This precludes straightforward analyses of long repetitive or duplicative sequences (such as CGG repeats in FMR1 or genetic typing of the pseudogene-rich HLA region), without the application of complex wet-lab preparations designed to circumvent these problems.67
Copy number variations may also be identified by NGS testing via analysis of coverage depth, producing data analogous to that of a CGH array. For example, an idealized sample with a deleted chromosomal segment would be expected to show roughly half the read depth at that locus if the deletion were heterozygous, and absent coverage if the deletion were homozygous.68 Taking this idea a step further, counting reads on a per-chromosome basis can help reveal the presence of extra or missing chromosomes. The most striking examples of this concept in clinical testing today are new non-invasive assays for the detection of fetal trisomies. In pregnancy, approximately ten percent of cell-free DNA may be derived from the fetus.69 A fetus with a trisomy will contribute extra copies of that chromosome to the mother’s blood, which can be detected simply by counting and analyzing the number of NGS reads that map to each individual chromosome.70 This technique is already having an impact in prenatal care, helping to detect these anomalies with high sensitivity and specificity, thus reducing unnecessary amniocentesis and chorionic villus sampling procedures.71
It should be noted that RNA is also quite amenable to NGS analysis following reverse transcription into cDNA. RNA-seq, as it is termed, can reveal which genes are actively transcribed in a sample, providing important contributory data to other genomics studies.72 Gene expression levels can also be compared between samples via analysis of depth of coverage, and evaluation of read mapping can uncover information about the expression of specific transcripts and alternative mRNA splicing. Additionally, RNA analysis can be used to uncover particular cancer translocations that produce fusion genes.73 Such gene fusion events typically take place via breakpoints within intronic sequences. As introns are frequently orders of magnitude larger than exons, hunting for fusion breakpoints using genomic DNA requires covering large areas. Breakpoint introns also often contain repetitive sequences that can create mapping problems. The process of transcription and mRNA splicing removes intronic sequences, making the mature mRNA a more straightforward target for analysis. Traditional PCR-based translocation assays also routinely utilize RNA for the same reason.
Clinical Integration of NGS
The computational processing associated with NGS analysis can be quite demanding. Processing the data from just one instrument involves the production of multiple intermediary files potentially equaling hundreds of gigabytes, and can take days to process on even some of the fastest computers. The era of big data has arrived, and we face large hurdles in the coming years as this technology continues to move into the clinic.74,75 For each laboratory to install and maintain sufficient computational infrastructure to support NGS applications would be prohibitively expensive, and not enough bioinformatics expertise currently exists to allow every laboratory to participate. For this reason, many groups are developing cloud-computing resources to support NGS operations, with the hope that economies of scale and resource sharing (informatics pipelines, etc.) will allow smaller clinics and laboratories to utilize NGS testing in the care of their patients.76
Clinical interpretation of NGS tests is fundamentally no different from that of more traditional assays. As always, proper controls, documentation, and adherence to strict workflows are required in order to ensure high confidence in the primary data before it may be interpreted in the context of the clinical scenario. For findings that are well documented and understood within the clinical context, interpretation is quite straightforward. For example, a truncating mutation in the dystrophin gene is easily interpretable in the context of a patient with findings characteristic of Duchenne muscular dystrophy (DMD). However, interpretation can be made more challenging by a few factors, alone or in combination, including:
• Atypical clinical scenario: A mutation of high expected severity may be difficult to interpret if the patient’s presentation is atypical for the disease. It should be noted that “wellness” is an atypical clinical scenario for every known disease. This raises the substantial problem of how to interpret and act upon genomic data in the context of well patients.
• Indeterminate mutational effects: Even in a patient with a classical presentation, mutations with unknown effects on protein function may be difficult to interpret, even if they are on the “correct” gene. This is particularly true of substitution mutations. Many effects-prediction software algorithms are available to help predict protein-function impact, but this is an imperfect science.
• Indeterminate gene effects: Similarly, interpretation of mutations in genes with little or no known association with the presumed disease process may be difficult or impossible to interpret as clinically pathogenic, even if they are predicted to severely affect the gene. Often this requires perusal of the scientific literature to help postulate mechanistic links, though this is most frequently insufficient to produce a clear determination of pathogenicity.
In some ways, the application of NGS to clinical medicine is a double- edged sword, because as a direct by-product of its power and scope, it raises the likelihood of all of the above issues and thus the chance of producing indeterminate results. Fortunately, many tools are emerging to help us sort through the data. Many public and private databases are available that contain disease–gene associations and previously documented disease mutations, both for constitutional genetic disease (e.g., Online Mendelian Inheritance in Man, ClinVar, Human Gene Mutation Database, etc.) and cancer (The Cancer Genome Atlas, Catalogue of Somatic Mutations in Cancer, etc.). Similarly, to help avoid confusion between common inherited SNPs and potential disease mutations, other critical sources provide compendiums of both common and rare inherited variants (e.g., Exome Variant Server, The 1000 Genomes Project, dbSNP, HapMap Project, etc.).
While these resources are tremendously valuable, in practice there is a frequent need for additional information, which is often unavailable. When presented with a variant of uncertain significance (VUS), whether somatic or constitutional, the only ways to develop confidence in its clinical importance are either to document its biological effect in a laboratory, or to produce corroborating evidence from another patient or affected family. What is currently missing in our health systems is a way to share this type of information broadly between laboratories to help identify similar cases. With the appropriate exchange of knowledge, the same VUS identified at two different laboratories could instead become two successful diagnoses. Today, a number of groups are working to create programs and systems to enable such sharing, and it is hoped that cloud computing and other communal resources can help support such endeavors.75,77 Clearly, there are important privacy concerns surrounding this type of data, and different rules governing such communications in each country, and any reforms or agreements must be made in a careful and responsible fashion.78,79
CURRENT AND FUTURE DIRECTIONS IN CLINICAL GENOMICS
Due to ongoing technological upheaval, the field of clinical genomics is in an exciting state of flux. In many cases, there is still minimal consensus regarding what is the most appropriate way to apply this new technology, particularly NGS, to individualized patient care. Technical capabilities are changing so quickly that best practices are in many ways a moving target. One clear trend is the current rapid replacement of traditional Sanger-based gene-sequencing tests with NGS assays. Full gene-sequencing by Sanger, particularly for large genes implicated in constitutional genetic disease, is a laborious process requiring separate PCRs and sequencing reactions for each exon for each patient. In contrast, NGS enables the rapid and inexpensive sequencing of panels of many genes, increasing the likelihood of successful diagnoses and helping prevent diagnostic odysseys. Similarly, nearly any assay relying on PCR to uncover small anomalies can be performed better and for less cost via NGS, though some smaller mainstay assays continue to rely on PCR.
In cancer diagnostics laboratories, most traditional assays for detecting small-scale mutational events (including both oncogene and tumor- suppressor mutations) are now being rapidly retired in favor of NGS profiling assays. In many academic and commercial laboratories, routine targeted examination of tens to hundreds of different cancer-related genes is now routine, with the goal of providing individualized treatment recommendations based on each patient’s mutational spectrum.80 Other clinical laboratories are adopting the strategy of even wider analysis, performing exome and even genome sequencing in order to uncover treatable targets. Some are convinced that this should be the ultimate strategic goal for all cancer patients, while others believe that data from whole genome cancer sequencing studies should be distilled so that clinical diagnostics can focus on the relevant (recurrently mutated) targets for each tumor.
With respect to large anomalies, particularly in cancer, certain types of variation are more amenable to NGS analysis than others. For example, NGS is well suited for targeted translocation detection because the identification of any cancer-specific fusion sequence is diagnostic. In contrast, copy number analyses are more difficult to perform in a superior manner via NGS or SNP arrays compared with FISH or cytogenetics. This is because FISH and cytogenetics operate on a single-cell basis, providing data from each individual cell analyzed. Thus, FISH analysis for Her2 amplification in a breast cancer biopsy would be able to detect significant amplification in a small number of tumor cells, even if the biopsy was heavily contaminated with normal cells. In contrast, NGS and SNP arrays only provide an average result of all sampled cells. In order to match the sensitivity of traditional methods for heterogeneous samples (particularly when tumor cell proportion is low), single-cell analysis or specific microdissection/sorting may be required. Single-cell sequencing is currently performed in the research setting, but as of today it is still too expensive and error-prone to be used as a clinical diagnostic. Thus, for the foreseeable future, FISH and cytogenetics will continue to retain an important role in the work-up of certain tumors.
Looking to the future, there is vast potential for novel and expanded uses of NGS technology in clinical medicine. While clinical laboratories are beginning to get a handle on DNA sequence analysis, many of the sequencing approaches that are routinely used in research laboratories are still uncharted territory for diagnostics. For example, there are essentially no clinically certified assays performed today that are based on RNA-seq. In clinical laboratories, RNA is typically analyzed either to investigate expression or to look for clinically relevant fusion (translocation) transcripts. With respect to gene expression, most clinical laboratories’ needs are met by oligonucleotide microarrays, which have become quite standardized and inexpensive.81 In contrast, gene-expression analysis by RNA-seq is still regarded as informatically complex. Therefore, it is most likely that RNA-seq will first find widespread adoption in clinical laboratories as a tool for translocation detection, as oligonucleotide arrays are incapable of detecting these anomalies.
As discussed, many other wet-lab library preparation methods exist to interrogate various aspects of genome biology, but technical challenges, shortages of expertise, and a lack of key clinical questions serve to slow the adoption of these methods into clinical laboratories. Examples include methylation sequencing or any of the methods revolving around protein pull- down to investigate protein–nucleic acid interactions.82,83
In addition to allowing more powerful analyses of traditional specimens, the freedom provided by NGS to explore essentially any sample type is opening up entirely new ways of thinking about health monitoring and diagnostics. Analysis of non-human genomes, such as those of parasites, bacteria, and viruses, is expanding, and recent discoveries relating health and the microbiome are beginning to change our understanding of self and wellness.84 For cancer and many other diseases, NGS and other technologies offer much potential for early screening and detection via assays to detect scant nucleic acid signatures in blood and other tissues (e.g., circulating tumor DNA). Maternal blood trisomy testing has opened the doors to many new possibilities for prenatal diagnostics. It is now possible to sequence the entire genome of a fetus from maternal plasma, raising both hope for this technology as well as potential ethical concerns.85
Many NGS applications, particularly WGS, are still quite expensive. However, the cost of sequencing has fallen precipitously over the last decade, and this trend seems likely to continue. If it does, applications like WGS may reach a point at which they are routinely affordable in the clinic. When that happens, it will place great pressure on our electronic health systems, as well as on the pathologists and geneticists who will face the task of interpreting all of the data. This may be compounded by the emergence of even more powerful technologies that may supplant NGS in the coming years. Single- molecule (or third-generation) sequencing systems are currently available that can operate directly on individual DNA molecules without requiring on- instrument amplification. As a result, they are not limited to short read lengths and can process DNA in a fraction of the time compared with NGS instruments.86,87 While they have not yet had an impact on the clinical diagnostics landscape, they offer an intriguing view of what awaits us over the horizon.
1. Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921.
2. Venter JC, et al. The sequence of the human genome. Science. 2001;291(5507):1304–1351.
3. Genomes Project Consortium, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319): 1061–1073.
4. Consortium EP, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74.
5. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490(7418):61–70.
6. Cancer Genome Atlas Research Network Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455(7216):1061–1068.
7. Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474(7353):609–615.
8. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 2013;499(7456):43–49.
9. Atkin NB, et al. The superfemale mole. Lancet. 1962;280(7258): 727–728.
10. Saiki RK, et al. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science. 1985;230(4732):1350–1354.
11. Cheng S, et al. Effective amplification of long targets from cloned inserts and human genomic DNA. Proc Natl Acad Sci U S A. 1994;91(12):5695– 5699.
12. Tjio JH, et al. The chromosome number in man. Heriditas. 1956; 42:1–6.
13. Craig JM, et al. Genes and genomes: Chromosome bands—flavours to savour. Bioessays. 1993;15:349–354.
14. Drets ME, et al. Specific banding patterns of human chromosomes. Proc Natl Acad Sci U S A. 1971;68(9):2073–2077.
15. Shaffer LG, McGowan-Jordan J, Schmid M. An International System for Human Cytogenetic Nomenclature. Recommendations of the International Standing Committee on Human Cytogenetic Nomenclature. Published in collaboration with ‘Cytogenetic and Genome Research’. Plus fold-out: ‘The Normal Human Karyotype G- and R-bands’. 2013. http://www.karger.com/Book/Home/257302
16. Rowley JD, et al. Relationship of centromeric heterochromatin to fluorescent banding patterns of metaphase chromosomes in the mouse. Nature. 1971;231(5304):503–506.
17. Blakemore KJ, et al. A method of processing first-trimester chorionic villous biopsies for cytogenetic analysis. Am J Hum Genet. 1984;36(6):1386–1393.
18. Ferguson-Smith MA. Cytogenetics and the evolution of medical genetics. Genet Med. 2008;10(8):553–559.
19. Nowell PC, et al. Chromosome studies on normal and leukemic human leukocytes. J Natl Cancer Inst. 1960;25:85–109.
20. Rowley JD. A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature. 1973;243(5405):290–293.
21. Vardiman JW, et al. The 2008 revision of the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia: rationale and important changes. Blood. 2009;114(5): 937–951.
22. Rowley JD, et al. 15/17 translocation, a consistent chromosomal change in acute promyelocytic leukaemia. Lancet. 1977;1(8010): 549–550.
23. Sakurai M, et al. 8–21 translocation and missing sex chromosomes in acute leukaemia. Lancet. 1974;2(7874):227–228.
24. List A, et al. Lenalidomide in the myelodysplastic syndrome with chromosome 5q deletion. N Engl J Med. 2006;355(14): 1456–1465.
25. Pinkel D, et al. Cytogenetic analysis using quantitative, high-sensitivity, fluorescence hybridization. Proc Natl Acad Sci U S A. 1986;83(9):2934– 2938.
26. Trask BJ. Fluorescence in situ hybridization: applications in cytogenetics and gene mapping. Trends Genet. 1991;7(5):149–154.
27. van Ommen GJ, et al. FISH in genome research and molecular diagnostics. Curr Opin Genet Dev. 1995;5(3):304–308.
28. Osoegawa K, et al. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res. 2001;11(3):483–496.
29. Trask BJ, et al. Fluorescence in situ hybridization to interphase cell nuclei in suspension allows flow cytometric analysis of chromosome content and microscopic analysis of nuclear organization. Hum Genet. 1988;78(3):251– 259.
30. Miller WH Jr., et al. Reverse transcription polymerase chain reaction for the rearranged retinoic acid receptor alpha clarifies diagnosis and detects minimal residual disease in acute promyelocytic leukemia. Proc Natl Acad Sci U S A. 1992;89(7):2694–2698.
31. Wolff AC, et al. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update. J Clin Onc. 2013;31(31):3997–4013.
32. Schrock E, et al. Multicolor spectral karyotyping of human chromosomes. Science. 1996;273(5274):494–497.
33. Schrock E, et al. Spectral karyotyping refines cytogenetic diagnostics of constitutional chromosomal abnormalities. Hum Genet. 1997;101(3):255– 262.
34. Karpova MB, et al. Combined spectral karyotyping, comparative genomic hybridization, and in vitro apoptyping of a panel of Burkitt’s lymphoma- derived B cell lines reveals an unexpected complexity of chromosomal aberrations and a recurrence of specific abnormalities in chemoresistant cell lines. Int J Oncol. 2006;28(3):605–617.
35. Veldman T, et al. Hidden chromosome abnormalities in haematological malignancies detected by multicolour spectral karyotyping. Nat Genet. 1997;15(4):406–410.
36. Southern EM. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol. 1975;98(3): 503–517.
37. Monaghan KG, et al. ACMG standards and guidelines for fragile X testing: a revision to the disease-specific supplements to the standards and guidelines for clinical genetics laboratories of the American College of Medical Genetics and Genomics. Genet Med. 2013;15(7):575–586.
38. Schouten JP, et al. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Res. 2002;30(12):e57.
39. Hardenbol P, et al. Multiplexed genotyping with sequence-tagged molecular inversion probes. Nat Biotechnol. 2003;21(6):673–678.
40. Willis AS, et al. Multiplex ligation-dependent probe amplification (MLPA) and prenatal diagnosis. Prenat Diagn. 2012;32(4):315–320.
41. Hömig-Hölzel C, et al. Multiplex ligation-dependent probe amplification (MLPA) in tumor diagnostics and prognostics. Diagn Mol Pathol. 2012;21(4):189–206.
42. Pinkel D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet. 1998;20(2):207–211.
43. Solinas-Toldo S, et al. Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer. 1997;20(4):399–407.
44. Sapolsky RJ, et al. High-throughput polymorphism screening and genotyping with high-density oligonucleotide arrays. Genet Anal. 1999;14(5–6):187–192.
45. Klein RJ, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308(5720):385–389.
46. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661–678.
47. Conlin LK, et al. Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis. Hum Mol Genet. 2010;19(7):1263–1275.
48. Miller DT, et al. Consensus statement: chromosomal microarray is a first- tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet. 2010;86(5):749–764.
49. Brenner S, et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nature Biotech. 2000;18(6):630– 634.
50. Alföldi J, et al. Comparative genomics as a tool to understand evolution and disease. Genome Res. 2013;23(7):1063–1068.
51. Koboldt DC, et al. The next-generation sequencing revolution and its impact on genomics. Cell. 2013;155(1):27–38.
52. Shendure J, et al. Next-generation DNA sequencing. Nat Biotechnol. 2008;26(10):1135–1145.
53. Dixon JR, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485(7398): 376–380.
54. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74.
55. Lister R, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462(7271): 315–322.
56. Robertson G, et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007;4(8):651–657.
57. Licatalosi DD, et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 2008;456(7221): 464–469.
58. Bainbridge NM, et al. Whole-genome sequencing for optimized patient management. Sci Transl Med. 2011;3(87): 87re3.
59. Ng SB, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461(7261):272–276.
60. Worthey EA, et al. Making a definitive diagnosis: successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genet Med. 2011;13(3):255–262.
61. Bamshad MJ, et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011;12(11):745–755.
62. Langmead B, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.
63. Li H, et al. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–1760.
64. Ulahannan D, et al. Technical and implementation issues in using next- generation sequencing of cancers in clinical practice. Br J Cancer. 2013;109(4):827–835.
65. Ruibin X, et al. Detecting structural variations in the human genome using next generation sequencing. Brief Funct Genomics. 2010;9(5–6):405–415.
66. Jiang Y, et al. PRISM: pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants. Bioinformatics. 2012;28(20):2576–2583.
67. Wang C, et al. High-throughput, high-fidelity HLA genotyping with deep sequencing. Proc Natl Acad Sci U S A. 2012;109(22): 8676–8681.
68. Baslan T, et al. Genome-wide copy number analysis of single cells. Nature Protocols. 2012;7(6):1024–1041.
69. Lo YM, et al. Presence of fetal DNA in maternal plasma and serum. Lancet. 1997;350(9076):485–487.
70. Chiu RW, et al. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc Natl Acad Sci U S A. 2008;105(51): 20458–20463.
71. Mersy E, et al. Noninvasive detection of fetal trisomy 21: systematic review and report of quality and outcomes of diagnostic accuracy studies performed between 1997 and 2012. Hum Reprod Update. 2013;19(4):318– 329.
72. Jongeneel CV, et al. An atlas of human gene expression from massively parallel signature sequencing (MPSS). Genome Res. 2005;15(7):1007– 1014.
73. Edgren H, et al. Identification of fusion genes in breast cancer by paired- end RNA-sequencing. Genome Biol. 2011;12(1):R6.
74. Tucker T, et al. Massively parallel sequencing: the next big thing in genetic medicine. Am J Hum Genet. 2009;85(2):142–154.
75. Grossman RL, et al. A vision for a biomedical cloud. J Intern Med. 2012;271(2):122–130.
76. Schatz MC, et al. Cloud computing and the DNA data race. Nat Biotechnol. 2010;28(7):691–693.
77. Baker M. One-stop shop for disease genes. Nature. 2012; 491(7423):171.
78. Lucassen A, et al. Consent and confidentiality in clinical genetic practice: guidance on genetic testing and sharing genetic information. Clin Med. 2012;12(1):5–6.
79. Creating a Global Alliance to Enable Responsible Sharing of Genomic and Clinical Data- the Global Genome Alliance. https://www.broadinstitute.org/files/news/pdfs/GAWhitePaperJune3.pdf
80. Frampton GM, et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat Biotechnol. 2013;31(11):1023–1031.
81. Arpino G, et al. Gene expression profiling in breast cancer: a clinical perspective. Breast. 2013;22(2):109–120.
82. Laird PW. Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genet. 2010;11(3):191–203.
83. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009;10(10):669–680.
84. The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207– 214.
85. Fan HC. Non-invasive prenatal measurement of the fetal genome. Nature. 2012;487(7407):320–324.
86. Eid J. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323(5910):133–138.
87. Howorka S, et al. Sequence-specific detection of individual DNA strands using engineered nanopores. Nat Biotechnol. 2001;19(7): 636–639.