THE FUTURE OF CANCER IN THE POST-GENOMIC ERA
The unveiling of the sequence of DNA in the human genome in 2003 was one of the most dramatic milestones in the history of science. Nevertheless, even in the immediate aftermath of that event, it would have been predicted the extraordinary advances of the following eight years that now permit individual genomes to be sequenced with great rapidity at low cost and have prompted an endeavour to compile a database of 10,000 complete cancer genomes. Within that period whole genome sequencing has revealed new cancer genes and promoted the development of novel drugs. It has illuminated hitherto unsuspected flexibility in human DNA, provided alternative strategies for the classification of tumours and already begun to change the treatment regimes that are offered to patients by clinicians. An armoury of great breadth and sophistication can now be deployed for the detection, classification and treatment of cancer.
The draft sequence of the human genome was largely completed by April 2003 in a phenomenal achievement that required quite stunning developments both of sequencing machines, robotics to handle clones and computing power to process the data and make it easily usable by the scientific community. Since 2003, equally dramatic technological advances have produced an almost unbelievable increase in the rate at which sequences can be obtained. These new technologies are called ‘next-generation’ or ‘second-generation’ sequencing and permit rapid, so-called ‘massively parallel’ sequencing of complete genomes in a single experiment. Efficient though second-generation sequencing is, it may be about to be overhauled by ‘third-generation’ sequencing in which an accurate sequence can be obtained without sequencing thousands of copies.
These advances have resulted in huge increases in both the speed and the precision of sequencing and ushered in the era of ‘personalised medicine’, meaning that individual genomes can be sequenced in a day for a cost approaching US$1,000. This has the revolutionary implication that the mutation pattern of an individual tumour can be used to design a therapeutic strategy before treatment is started. We will see shortly when considering tumour biomarkers that genome sequencing also offers a method both for monitoring tumour response to treatment and potentially for tumour detection from DNA in blood long before there are any clinical signs (palpable lumps, bleeding, etc.). From this, it is evident that the science of genomics is poised to make the greatest impact on the medical science of any advance in our history and the following examples illustrate some of the major advances that have already occurred in the first decade of the twenty-first century.
Following the full sequencing of the human genome, Michael Stratton, working at the Sanger Centre, completed a study that would have been inconceivable just a few years earlier. Inconceivable because it required the sequence of the human genome but also because it relied on the fantastic technical developments that permitted massive lengths of DNA to be sequenced at high speed. Five hundred kinase enzymes were selected that were known or thought to be involved in cell signalling pathways that controlled growth. A large number of tumour samples were then screened by sequencing to see if there were any mutations in the kinases. As some kinases were already known to be frequently mutated in human cancers, the appearance of some of these was anticipated. What came as a surprise, however, was that one kinase gene, BRAF, had the same single base mutation in about two-thirds of the tumours (V600E). A tumour they had chosen was melanoma – the most rapidly increasing cancer in the frequency of occurrence in the UK population – about which virtually nothing was known at the molecular level. Stratton’s group had discovered a new ‘cancer gene’ that played a major role in a very prevalent cancer. This rapidly led to the synthesis of a drug that is very efficient at blocking the action of the mutant form of BRAF (see below) and thus offers a chemotherapeutic approach to treating melanoma. In the end, of course, BRAF’s role in melanoma would have been discovered by the same slow and painstaking methods that had previously identified many cancer genes, but that might have taken years or even decades. Whole genome sequencing and the era of genomics had made an almost immediate impact on medical science.
In 2007, the J. Craig Venter Institute released the first individual genome sequence that of the institute’s founder. This was followed almost immediately by that of James Watson, from the Human Genome Sequencing Center in Houston, the first full genome to be sequenced using next-generation rapid-sequencing technology. Shortly after that three complete sequences of individual human genomes were published simultaneously in 2008. These were from a male Yoruba from Ibadan, Nigeria, a male Han Chinese and a female who had died from acute myeloid leukaemia (AML). The new technology of massively parallel sequencing permitted repeated determinations to give an ‘average depth’ of over 30 times, greatly increasing the confidence with which variants could be assigned. Each of these sequences revealed several million single nucleotide variants (SNVs) when compared with a human reference assembly, for example, the single nucleotide polymorphism database (dbSNP) run by the National Center for Biotechnology Information (NCBI) that includes a range of molecular variation in addition to SNPs (Fig. 1).
- Genomic fluidity illustrated by the overlap of SNPs between the genomes from different tissues in one individual and between individuals.
The AML study identified nearly four million tumours SNVs. Somewhat surprisingly, the majority of these are also present in the reference genome or in the Venter or Watson genomes and, after their subtraction, 31,632 new SNVs remained that were unique to the tumour genome (Fig. 1). Most of these were in intronic regions or untranslated regions but 14 were validated as germline SNVs (i.e. SNPs) and eight were somatically acquired, non-synonymous mutations (in CDH24, SLC15A1, KNDC1, PTPRT, GRINL1B, GPR123, EBI2 and PCLKC). Mutations in FLT3 and NPM1 that had been identified previously were also detected.
The extensive overlap between SNPs in the AML a tumour and in the reference genomes highlights the astonishingly dynamic nature of human DNA and indicates that the definition of ‘normality’ with regard to sequence is somewhat arbitrary. This ever-changing background might suggest the impossibility of detecting individual polymorphisms that make small contributions to tumour development. The increased expression of the FGFR2 gene and promote ER+ breast cancers, such associations can be identified provided the numbers of cases and controls studied are sufficiently large.
Subsequent WGS has characterised copy-number alterations in primary lung adenocarcinomas, identifying 57 significantly recurrent events in 371 tumours. These included 24 amplifications and 7 homozygous deletions, 25 of these gross changes not having been previously associated with lung cancer. Also, 26 of 39 chromosome arms had large-scale copy number gain or loss. The mutational signature in a small cell lung cancer genome arising from carcinogens in tobacco smoke has also been resolved to reveal 22,910 somatic mutations in the genome of one individual, of which 134 were in coding exons. This permitted the estimation that the cells of a tumour had acquired, on average, one mutation for every 15 cigarettes smoked.
Somatic rearrangements generating chimeric genes are a familiar mechanism of oncogenic activation in leukaemias. However, Stephens et al. (2009), using paired-end sequencing of 65,000,000 randomly generated (500 bp) DNA fragments (i.e. sequencing both ends), showed that this process also plays an important role in breast cancer (Fig. 2).
- Somatic mutations in breast cancer represented as genome-wide Circos plots.
In 24 primary breast tumours, there was an average of 38 such rearrangements per a tumour with over 200 present in some tumours. Rather than generating a fusion gene, the most common rearrangement was tandem duplication. The frequency showed an unexpected variation between tumours – from none to over 100 with a size range of duplicated segments from 3 kb to >1 Mb. From a prognostic viewpoint, the most exciting finding was that one of the four main categories of breast cancer, namely basal-like cancers, generally had large numbers of tandem duplications, fewer rearrangements being associated with luminal-A and luminal-B types.
In an alternative approach to breast cancer, the genome and the transcriptome of a breast cancer metastasis were sequenced and the genomic sequence compared with that of DNA from a primary tumour that had arisen nine years earlier. This identified 32 non- synonymous coding mutations that fell into three mutational patterns: (1) mutations in five genes present in both the primary and the metastasis; (2) six mutations that appeared to be present only in minor clones of the tumour; and (3) 19 metastasis mutations not present in the primary. This study also identified 75 RNA editing events, two of which were novel changes, affecting 12 loci that had occurred in the metastatic transcriptome. The detection of RNA editing illustrates the importance of integrating genomic and RNA sequencing and, taken together, the data reveal the extent of evolution that can occur during metastatic progression.
Turning to pancreatic carcinomas, 1,562 somatic mutations have been identified in a screen of 24 tumours. From these emerged a set of 12 intracellular signalling pathways, six of which had a mutation in one of its constituent genes in all the 24 tumours screened. The other six pathways were also mutated in over half the tumours. There was, nonetheless, diversity in the specific genes involved so that, for example, mutations in four different genes could contribute to the disruption of the TGFβ signalling pathway. This kind of genomic snapshot revealing critical pathways can only be obtained through sequencing the entire genome.
These and a continuing stream of studies are revealing the immense complexity of mutational events that make every a tumour unique at the detailed molecular level, even though major patterns of ‘drivers’ may characterise specific types of cancer. Bewildering though the complexity may be, it is at least consistent with the notion of protracted accumulation of mutations as clonal evolution takes its course and for the majority of cancers that picture is probably accurate. In a startling development, the application of paired-end sequencing to multiple tumours has revealed that perhaps 3% of cancers, of widely varying type, arise through a completely different mechanism of genetic instability. This takes the form of a single cataclysmic event in which of a limited number of chromosomes shatter into fragments. DNA repair systems are activated that repair the damage as best they can, mainly by non-homologous end-joining, in what appears to be a random process giving rise to every type of inversion and juxtaposition of the fragments that are rescued (Fig. 3).
3. Chromosome shattering and fragment assembly in the process of chromothripsis.
Stephens and colleagues (2011) have called this process chromothripsis (Greek: thripsis – shattering into pieces) and suggest that it arises from a mitotic defect when the condensed structure of chromatin would predispose to the clustering of breaks within limited segments.
The highly focused pattern of mutations that characterises chromothripsis is evident from the genome-wide profile of rearrangements in a case of chronic lymphocytic leukaemia (Fig. 4).
- Genomic rearrangements localised to chromosome 4q in a chronic lymphocytic leukaemia.
Other types of cancer from which evidence for chromothripsis has been obtained include melanoma, small cell lung cancer, non-small cell lung cancer, glioma, synovial sarcoma, and oesophageal, colorectal, renal and thyroid tumours.
One feature of chromothripsis is that, although alteration of gene copy number occurs at numerous locations in affected regions, the almost invariable result is only one or two copies. This finding, together with the retention of heterozygosity and the intensive clustering of the alterations within discrete regions of the affected chromosome, provide strong evidence that the majority of the rearrangements occur in a single cellular catastrophe (Fig. 5).
5. Chromothripsis involving five chromosomes.
One particularly telling example of how chromothripsis can promote cancer development has come from a small cell lung tumour sample that had huge amplification of the MYC gene on chromosome 8, producing up to 200 copies per cell. A combination of sequencing and FISH revealed one normal copy of chromosome 8 together with two massively rearranged derivative chromosomes. The latter had arisen by the random stitching of fragments from the shattered regions of the other chromosome, followed by chromosomal duplication. In addition, 15 other fragments had been pieced together to form a double minute chromosome of about 1 Mb that had subsequently undergone massive amplification. The fragments included MYC, thereby revealing the mechanism of hyper-expression of this potent proliferation driver. In parallel with specific oncogenic events, chromothripsis has also been shown to cause either disruption of tumour suppressor genes or their complete loss if they are carried by fragments that fail to be incorporated into the derivative mosaic by the DNA repair machinery.
The most incredible event of chromothripsis may give birth to hundreds of chromosomal rearrangements in a single event. Not the least amazing aspect of this nuclear breakdown is that cells, albeit a tiny minority, can survive as functional units among which some now have a selective advantage in terms of growth and hence their capacity to evolve into a tumour clone.
These examples illustrate the bewildering complexity of instability in cancer genomes, unveiled through the power of second-generation sequencing. Although these are early steps in cancer genome sequencing, they have already revealed not only new ‘cancer genes’ that are potential therapeutic targets but pathways that are high-frequency mutational targets in specific cancers together with patterns of mutational evolution during progression from a primary tumour to metastasis.