2.2.1 Cytogenetics and Karyotyping
Cancer arises as a result of the stepwise accumulation of genetic changes that confer a selective growth advantage to the involved cells (see Chap. 5, Sec. 5.2). These changes may consist of abnormalities in specific genes (such as amplification of oncogenes or deletion of tumor-suppressor genes). Although molecular techniques can identify specific DNA mutations, cytogenetics provides an overall description of chromosome number, structure, and the extent and nature of chromosomal abnormalities.
Several techniques can be used to obtain tumor cells for cytogenetic analysis. Leukemias and lymphomas from peripheral blood, bone marrow, or lymph node biopsies are easily dispersed into single cells suitable for chromosomal analysis. In contrast, cytogenetic analysis of solid tumors has several difficulties; the cells are tightly bound together and must be dispersed by mechanical means and/or by digestion with proteolytic enzymes (eg, collagenase) which can damage cells. Secondly, the mitotic index in solid tumors is often low (see Chap. 9, Sec. 9.2), making it difficult to find enough metaphase cells to obtain good-quality cytogenetic preparations. Finally, lymphoid and myeloid and other (normal) cells often infiltrate solid tumors and may be confused with the malignant cell population.
Chromosomes are usually examined in metaphase, when they become condensed and appear as 2 identical sister chromatids held together at the centromere as DNA replication has already occurred at that stage of mitosis. Exposure of the tumor cells to agents such as colcemid arrests them in metaphase by disrupting the mitotic spindle fibers that normally separate the chromatids. The cells are then swollen in a hypotonic solution, fixed in methanol-acetic acid, and metaphase "spreads" are prepared by physically dropping the fixed cells onto glass microscope slides.
Chromosomes can be recognized by their size and shape and by the pattern of light and dark "bands" observed after specific staining. The most popular way of generating banded chromosomes is proteolytic digestion with trypsin, followed by a Giemsa stain. A typical metaphase spread prepared using conventional methods has approximately 550 bands, whereas cells spread at prophase can have more than 800 bands; these bands can be analyzed using bright-field microscopy and digital photography. The result of cytogenetic analysis is a karyotype, which, in written form, describes the chromosomal abnormalities using the international consensus cytogenetic nomenclature (Brothman et al, 2009; see Fig. 2–1 and Table 2–1). Table 2–2 lists common chromosomal abnormalities in lymphoid and myeloid malignancies.
TABLE 2–1Nomenclature for chromosomes and their abnormalities. ||Download (.pdf) TABLE 2–1 Nomenclature for chromosomes and their abnormalities.
|Description ||Meaning |
|–1 ||Loss of one chromosome 1 |
|+7 ||Gain of extra chromosome 7 |
|2q– or del (2q) ||Deletion of part of long arm of chromosome 2 |
|4p+ ||Addition of material to short arm of chromosome 4 |
|t(9;22)(q34;q11) ||Reciprocal translocation between chromosomes 9 and 22 with break points at q34 on chromosome 9 and q11 on chromosome 22 |
|iso(6p) ||Isochromosome with both arms derived from the short arm of chromosome 6 |
|inv(16)(p13q22) ||Part of chromosome 16 between p13 and q22 is inverted |
TABLE 2–2Common chromosomal abnormalities in lymphoid and myeloid malignancies. ||Download (.pdf) TABLE 2–2 Common chromosomal abnormalities in lymphoid and myeloid malignancies.
|Malignancy ||Chromosomal Aberration* ||Molecular Lesion |
|Acute myeloid leukemia (AML) || || |
| M1, M2 subtypes ||t(8;21)(q22;q22) ||AML1-MTG8 fusion |
| M3 subtype ||t(15;17)(q22;q11.2) ||PML-RARA fusion |
| M4Eo subtype ||inv(16)(p13;q22) or t(16;16)(p13;q22) ||MYH11-CBFB fusion |
| M2 or M4 subtypes ||t(6;9)(p23;q24) ||DEK-CAN fusion |
|Therapy-related AML ||~5/del(5q), ~7/del(7q) || |
|Chronic myeloid leukemia (CML) ||t(9;22)(q34;q11) (Ph1 chromosome) ||BCR-ABL fusion encoding p210 protein |
|CML blast crisis ||t(9;22)(q34;q11), 8, +Ph1, 19, or i(17q) ||BCR-ABL fusion encoding p210 protein, TP53 mutation |
|Acute lymphocytic leukemia (ALL) ||t(9;22)(q34;q11) ||BCR-ABL fusion encoding p190 protein |
|Pre-B ALL ||t(1;19)(q23;p13.3) ||E2A-PBX1 fusion |
|Pre-B ALL ||t(17;19)(q22;p13.3) ||E2A-HLF fusion |
|B-ALL, Burkitt lymphoma || |
|Translocations between myc and IgH, IgLκ and IgLλ loci |
|B-Chronic lymphocytic leukemia ||+12,t(14q32) ||Translocations of IgH locus |
The photograph on the left (A) shows a typical karyotype from a patient with chronic myelogenous leukemia. By international agreement, the chromosomes are numbered according to their appearance following G-banding. Note the loss of material from the long arm of one copy of the chromosome 22 pair (the chromosome on the right) and its addition to the long arm of 1 copy of chromosome 9 (also the chromosome on the right of the pair). B) A schematic illustration of the accepted band pattern for this rearrangement. The green and red lines indicate the precise position of the break points that are involved. The karyotypic nomenclature for this particular chromosomal abnormality is t(9;22)(q34;q11). This description means that there is a reciprocal translocation between chromosomes 9 and 22 with break points at q34 on chromosome 9 and q11 on chromosome 22. The rearranged chromosome 22 is sometimes called the Philadelphia chromosome (or Ph chromosome), after the city of its discovery.
The study of solid tumors has been facilitated by new analytic approaches that combine elements of conventional cytogenetics with molecular methodologies. This new hybrid discipline is called molecular cytogenetics, and its application to tumor analysis usually involves the use of techniques based on fluorescence in situ hybridization or FISH (see Sec. 2.2.6).
2.2.2 Hybridization and Nucleic Acid Probes
DNA is composed of 2 complementary strands (the sense strand and the non-sense strand) of specific sequences of 4 nucleotide bases that make up the genetic alphabet. The association (via hydrogen bonds) between 2 bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair (often abbreviated bp). In the canonical Watson-Crick DNA base pair, adenine (A) forms a base pair with thymine (T) and guanine (G) forms a base pair with cytosine (C). In RNA, thymine is replaced by uracil (U). There are 2 processes that rely on this base pairing (Fig. 2–2). As DNA replicates during the S phase of the cell cycle, a part of the helical DNA molecule unwinds and the strands separate under the action of topoisomerase II (see Chap. 18, Fig. 18–13). DNA polymerase enzymes add nucleotides to the 3′-hydroxyl (3′-OH) end of an oligonucleotide that is hybridized to a template, thus leading to synthesis of a complementary new strand of DNA. Transcription of messenger RNA (mRNA) takes place through an analogous process under the action of RNA polymerase with one of the DNA strands (the non-sense strand) acting as a template; complementary bases (U, G, C, and A) are added to the mRNA through pairing with bases in the DNA strand so that the sequence of bases in the RNA is the same as in the "sense" strand of the DNA (except that U replaces T). During this process the DNA strand is separated temporarily from its partner through the action of topoisomerase I (see Chap. 18, Sec. 18.4). Only parts of the DNA in each gene are translated into polypeptides, and these coding regions are known as exons; non-coding regions (introns) are interspersed throughout the genes and are spliced out of the mRNA transcript during the RNA maturation process and before protein synthesis. Synthesis of polypeptides, the building blocks of proteins, are then directed by the mRNA in association with ribosomes, with each triplet of bases in the exons of the DNA encoding a specific amino acid that is added to the polypeptide chain.
The DNA duplex molecule, also called the double helix, consists of 2 strands that wind around each other. The strands are held together by chemical attraction of the bases that comprise the DNA. A bonds to T and G bonds to C. The bases are linked together to form long strands by a "backbone" chemical structure. The DNA bases and backbone twist around to form a duplex spiral.
To develop an understanding of the techniques now used in both clinical cancer care and research, it is necessary to understand the specificity of hybridization and the action and fidelity of DNA polymerases. When double-stranded DNA is heated, the complementary strands separate (denature) to form single-stranded DNA. Given suitable conditions, separated complementary regions of specific DNA sequences can join together to reform a double-stranded molecule. This renaturation process is called hybridization. This ability of single-stranded nucleic acids to hybridize with their complementary sequence is fundamental to the majority of techniques used in molecular genetic analysis. Using an appropriate reaction mixture containing the relevant nucleotides and DNA or RNA polymerase, a specific piece of DNA can be copied or transcribed. If radiolabeled or fluorescently labeled nucleotides are included in a reaction mixture, the complementary copy of the template can be used as a highly sensitive hybridization-dependent probe.
2.2.3 Restriction Enzymes and Manipulation of Genes
Restriction enzymes are endonucleases that have the ability to cut DNA only at sites of specific nucleotide sequences and always cut the DNA at exactly the same place within the designated sequence. Figure 2–3 illustrates some commonly used restriction enzymes together with the sequence of nucleotides that they recognize and the position at which they cut the sequence. Restriction enzymes are important because they allow DNA to be cut into reproducible segments that can be analyzed precisely. An important feature of many restriction enzymes is that they create sticky ends. These ends occur because the DNA is cut in a different place on the 2 strands. When the DNA molecule separates, the cut end has a small single-stranded portion that can hybridize to other fragments having compatible sequences (ie, fragments digested using the same restriction enzyme) thus allowing investigators to cut and paste pieces of DNA together.
The nucleotide sequences recognized by 5 different restriction endonucleases are shown. On the left side, the sequence recognized by the enzyme is shown; the sites where the enzymes cut the DNA are shown by the arrows. On the right side, the 2 fragments produced following digestion with that restriction enzyme are shown. Note that each recognition sequence is a palindrome; ie, the first 2 or 3 bases are complementary to the last 2 or 3 bases. For example, for Eco R1, GAA is complementary to TTC. Also note that following digestion, each fragment has a single-stranded tail of DNA. This tail is useful in allowing fragments that contain complementary overhangs to anneal with each other.
Once a gene has been identified, the DNA segment of interest can be inserted into a bacterial virus or plasmid to facilitate its manipulation and propagation using restriction enzymes. A complementary DNA strand (cDNA) is first synthesized using mRNA as the template by a reverse transcriptase enzyme. This cDNA contains only the exons of the gene from which the mRNA was transcribed. Figure 2–4 presents a schematic of how a restriction fragment of DNA containing the coding sequence of a gene can be inserted into a bacterial plasmid conferring resistance against the drug ampicillin to the host bacterium. The plasmid or virus is referred to as a vector carrying the passenger DNA sequence of the gene of interest. The vector DNA is cut with the same restriction enzyme used to prepare the cloned gene, so that all the fragments will have compatible sticky ends and can be spliced back together. The spliced fragments can be sealed with the enzyme DNA ligase, and the reconstituted molecule can be introduced into bacterial cells. Because bacteria that take up the plasmid are resistant to the drug (eg, ampicillin), they can be isolated and propagated to large numbers. In this way, large quantities of a gene can be obtained (ie, cloned) and labeled with either radioactivity or biotin for use as a DNA probe for analysis in Southern or northern blots (see Sec. 2.2.4). Cloned DNA can be used directly for nucleotide sequencing (see Sec. 2.2.10), or for transfer into other cells. Alternatively, the starting DNA may be a complex mixture of different restriction fragments derived from human cells. Such a mixture could contain enough DNA so that the entire human genome is represented in the passenger DNA inserted into the vectors. When a large number of different DNA fragments have been inserted into a vector population and then introduced into bacteria, the result is a DNA library, which can be plated out and screened by hybridization with a specific probe. In this way an individual recombinant DNA clone can be isolated from the library and used for most of the other applications described in the following sections.
Insertion of a gene into a bacterial plasmid. The cDNA of interest (pink line) is digested with a restriction endonuclease (depicted by scissors) to generate a defined fragment of cDNA with "sticky ends." The circular plasmid DNA is cut with the same restriction endonuclease to generate single-stranded ends that will hybridize and to the cDNA fragment. The recombinant DNA plasmid can be selected for growth using antibiotics because the ampicillin-resistance gene (hatched) is included in the construct. In this way, large amounts of the human cDNA can be obtained for further purposes (eg, for use as a probe on a Southern blot).
2.2.4 Blotting Techniques
Southern blotting is a method for analyzing the structure of DNA (named after the scientist who developed it). Figure 2–5 outlines schematically the Southern blot technique. The DNA to be analyzed is cut into defined lengths using a restriction enzyme, and the fragments are separated by electrophoresis through an agarose gel. Under these conditions the DNA fragments are separated based on size, with the smallest fragments migrating farthest in the gel and the largest remaining near the origin. Pieces of DNA of known size are electrophoresed at the same time (in a separately loaded well) and act as a molecular mass marker. A piece of nylon membrane is then laid on top of the gel and a vacuum is applied to draw the DNA through the gel into the membrane, where it is immobilized. A common application of the Southern technique is to determine the size of the fragment of DNA that carries a particular gene. The nylon membrane containing all the fragments of DNA cut with a restriction enzyme is incubated in a solution containing a radioactive or fluorescently-labeled probe which is complementary to part of the gene (see Sec. 2.2.2). Under these conditions, the probe will anneal with homologous DNA sequences present on the DNA in the membrane. After gentle washing to remove the single-stranded, unbound probe, the only labeled probe remaining on the membrane will be bound to homologous sequences of the gene of interest. The location of the gene on the nylon membrane can then be detected either by the fluorescence or radioactivity associated with the probe. An almost identical procedure can be used to characterize mRNA separated by electrophoresis and transferred to a nylon membrane. The technique is called northern blotting and is used to evaluate the expression patterns of genes. An analogous procedure, called western blotting, is used to characterize proteins. Following separation by denaturing gel electrophoresis, the proteins are immobilized by transfer to a charged synthetic membrane. To identify specific proteins, the membrane is incubated in a solution containing a specific primary antibody either directly labeled with a fluorophore, or incubated with a secondary antibody that will bind to the primary antibody and is conjugated to horseradish peroxidase (HRP) or biotin. The primary antibody will bind only to the region of the membrane containing the protein of interest and can be detected either directly by its fluorescence or by exposure to chemoluminescence detection reagents.
Analysis of DNA by Southern blotting. Schematic outline of the procedures involved in analyzing DNA fragments by the Southern blotting technique. The method is described in more detail in the text.
2.2.5 The Polymerase Chain Reaction
The polymerase chain reaction (PCR) allows rapid production of large quantities of specific pieces of DNA (usually about 200 to 1000 base pairs) using a DNA polymerase enzyme called Taq polymerase (which is isolated from a thermophilic bacterial species and is thus resistant to denaturation at high temperatures). Specific oligonucleotide primers complementary to the DNA at each end of (flanking) the region of interest are synthesized or obtained commercially, and are used as primers for Taq polymerase. All components of the reaction (the target DNA, primers, deoxynucleotides, and Taq polymerase) are placed in a small tube and the reaction sequence is accomplished by simply changing the temperature of the reaction mixture in a cyclical manner (Fig. 2–6A). A typical PCR reaction would involve: (a) Incubation at 94°C to denature (separate) the DNA duplex and create single-stranded DNA. (b) Incubation at 53°C to allow hybridization of the primers, which are in vast excess (this temperature may vary depending on the sequence of the primers). (c) Incubation at 72°C to allow Taq polymerase to synthesize new DNA from the primers. Repeating this cycle permits another round of amplification (Fig. 2–6B). Each cycle takes only a few minutes. Twenty cycles can theoretically produce a million-fold amplification of the DNA of interest. PCR products can then be sequenced or subjected to other methods of genetic analysis. Polymerase proteins with greater heat stability and copying fidelity can allow for long-range amplification using primers separated by as much as 15 to 30 kilobases of intervening target DNA (Ausubel and Waggoner, 2003). The PCR is exquisitely sensitive and its applications include the detection of minimal residual disease in hematopoietic malignancies and of circulating cancer cells from solid tumors.
A) Reaction sequence for 1 cycle of PCR. Each line represents 1 strand of DNA; the small rectangles are primers and the circles are nucleotides. B) The first 3 cycles of PCR are shown schematically. C) Ethidium bromide-stained gel after 20 cycles of PCR. See text for further explanation. D) Real-time PCR using SYBR Green dye. SYBR Green dye binds preferentially to double-stranded DNA; therefore, an increase in the concentration of a double-stranded DNA product leads to an increase in fluorescence. During the polymerization step, several molecules of the dye bind to the newly synthesized DNA and a significant increase in fluorescence is detected and can be monitored in real time. E) Real-time PCR using fluorescent dyes and molecular beacons. During denaturation, both probe and primers are in solution and remain unbound from the DNA strand. During annealing, the probe specifically hybridizes to the target DNA between the primers (top panel) and the 5′-to-3′ exonuclease activity of the DNA polymerase cleaves the probe, thus dissociating the quencher molecule from the reporter molecule, which results in fluorescence of the reporters.
PCR is widely used to study gene expression or screen for mutations in RNA. Reverse transcriptase is used to make a single-strand cDNA copy of an mRNA and the cDNA is used as a template for a PCR reaction as described above. This technique allows amplification of cDNA corresponding to both abundant and rare RNA transcripts. The development of real-time quantitative PCR has allowed improved quantitation of the DNA (or cDNA) template and has proven to be a sensitive method to detect low levels of mRNA (often obtained from small samples or microdissected tissues) and to quantify gene expression. Different chemistries are available for real time detection (Fig. 2–6C, D). There is a very specific 5′ nuclease assay, which uses a fluorogenic probe for the detection of reaction products after amplification, and there is a less specific but much less expensive assay, which uses a fluorescent dye (SYBR Green I) for the detection of double-stranded DNA products. In both methods, the fluorescence emission from each sample is collected by a charge-coupled device camera and the data are automatically processed and analyzed by computer software. Quantitative real-time PCR using fluorogenic probes can analyze multiple genes simultaneously within the same reaction. The SYBR Green methodology involves individual analysis of each gene of interest but, using multiwell plates, both approaches provide high-throughput sample analysis with no need for post-PCR processing or gels.
2.2.6 Fluorescence in Situ Hybridization
To perform fluorescence in situ hybridization (FISH), DNA probes specific for a gene or particular chromosome region are labeled (usually by incorporation of biotin, digoxigenin, or directly with a fluorochrome) and then hybridized to (denatured) metaphase chromosomes. The DNA probe will reanneal to the denatured DNA at its precise location on the chromosome. After washing away the unbound probe, the hybridized sequences are detected using avidin directly (which binds strongly to biotin), or antibodies to digoxigenin that are coupled to fluorescent secondary antibodies, such as fluorescein isothiocyanate. The sites of hybridization are then detected using fluorescent microscopy. The main advantage of FISH for gene analyses is that information is obtained directly about the positions of the probes in relation to chromosome bands or to other previously or simultaneously mapped reference probes.
FISH can be performed on interphase nuclei from paraffin-embedded tumor biopsies or cultured tumor cells, which allows cytogenetic aberrations such as amplifications, deletions or other abnormalities of whole chromosomes to be visualized without the need for obtaining good-quality metaphase preparations. For example, FISH is a standard technique to determine the HER2 status of breast cancers and can be used to detect N-myc amplification in neuroblastoma (Fig. 2–7). Whole chromosome abnormalities can also be detected using specific centromere probes that lead to 2 signals from normal nuclei, 1 signal when there is only 1 copy of the chromosome (monosomy), or 3 signals when there is an extra copy (trisomy). Chromosome or gene deletions can also be detected with probes from the relevant regions. For example, if the probes used for FISH are close to specific translocation break points on different chromosomes, they will appear joined as a result of the translocation generating a "color fusion" signal or conversely, alternative probes can be designed to "break apart" in the event of a specific gene deletion or translocation. This technique is particularly useful for the detection of the bcr-abl rearrangement in chronic myeloid leukemia (Fig. 2–8) and the tmprss2-erg abnormalities in prostate cancer (Fig. 2–9).
MYCN amplification in nuclei from neuroblastoma detected by FISH with a MYCN probe (magenta speckling) and a deletion of the short arm of chromosome 1. The signal (pale blue-green) from the remaining chromosome 1 is seen as a single spot in each nucleus.
Detection of the Philadelphia chromosome in interphase nuclei of leukemia cells. All nuclei contain 1 green signal (BCR gene), 1 pink signal (ABL gene), and an intermediate fusion yellow signal because of the 9:22 chromosome translocation.
FISH analysis showing rearrangement of TMPRSS2 and ERG genes in PCa. A) FISH confirms the colocalization of Oregon Green-labeled 5 V ERG (green signals), AlexaFluor 594-labeled 3 V ERG (red signals), and Pacific Blue-labeled TMPRSS2 (light blue signals) in normal peripheral lymphocyte metaphase cells and in normal interphase cells. B) In PCa cells, break-apart FISH results in a split of the colocalized 5 V green/3 V red signals, in addition to a fused signal (comprising green, red, and blue signals) of the unaffected chromosome 21. Using the TMPRSS2/ERG set of probes on PCa frozen sections, TMPRSS2 (blue signal) remains juxtaposed to ERG 3 V (red signal; see white arrows), whereas colocalized 5 V ERG signal (green) is lost, indicating the presence of TMPRSS2/ERG fusion and concomitant deletion of 5 V ERG region. (Reproduced with permission from Yoshimoto et al, 2006.)
2.2.7 Comparative Genomic Hybridization
If the cytogenetic abnormalities are unknown, it is not possible to select a suitable probe to clarify the abnormalities by FISH. Comparative genomic hybridization (CGH) has been developed to produce a detailed map of the differences between chromosomes in different cells by detecting increases (amplifications) or decreases (deletions) of segments of DNA.
For analysis of tumors by CGH, the DNA from malignant and normal cells is labeled with 2 different fluorochromes and then hybridized simultaneously to normal chromosome metaphase spreads. For example, tumor DNA is labeled with biotin and detected with fluorescein (green fluorescence) while the control DNA is labeled with digoxigenin and detected with rhodamine (red fluorescence). Regions of gain or loss of DNA, such as deletions, duplications, or amplifications, are seen as changes in the ratio of the intensities of the 2 fluorochromes along the target chromosomes. One disadvantage of CGH is that it can detect only large blocks (>5 Mb) of over- or underrepresented chromosomal DNA and balanced rearrangements (such as inversions or translocations) can escape detection. Improvements to the original CGH technique have used microarrays where CGH is applied to arrayed sequences of DNA bound to glass slides. The arrays are constructed using genomic clones of various types such as bacterial artificial chromosomes (a DNA construct that can be used to carry 150 to 350 kbp [kilobase pairs] of normal DNA) or synthetic oligonucleotides that are spaced across the entire genome. This technique has allowed the detection of genetic aberrations of smaller magnitude than was possible using metaphase chromosomes, although they have now been superseded by high density single-nucleotide polymorphism (SNP) arrays (see below).
2.2.8 Spectral Karyotyping/Multifluor Fluorescence in Situ Hybridization
A deficiency of both array CGH and conventional cDNA microarrays is the lack of information about structural changes within the karyotype. For example, with an expression array, a particular gene may be overexpressed but it would be unclear whether this is secondary to a translocation placing the gene next to a strong promoter or an amplification. Universal chromosome painting techniques have been developed to assist in this determination with which it is possible to analyze all chromosomes simultaneously. Two commonly used techniques, spectral karyotyping (SKY) (Veldman et al, 1997) and multifluor fluorescence in situ hybridization (M-FISH) (Speicher et al, 1996), are based on the differential display of colored fluorescent chromosome-specific paints, which provide a complete analysis of the chromosomal complement in a given cell. Using this combination of 23 different colored paints as a "cocktail probe," subtle differences in fluorochrome labeling of chromosomes after hybridization allows a computer to assign a unique color to each chromosome pair. Abnormal chromosomes can be identified by the pattern of color distribution along them with chromosomal rearrangements leading to a distinct transition from one color to another at the position of the breakpoint (Fig. 2–10). In contrast to CGH, detection of such karyotype rearrangements using SKY and M-FISH is not dependent upon change in copy number. This technology is particularly suited to solid tumors where the complexity of the karyotypes may mask the presence of chromosomal aberrations.
SKY and downstream analyses of a patient with a translocation. One of the aberrant chromosomes can initially be seen with G banding, the same metaphase spread has been subjected to SKY and then a 12;14 reciprocal translocation is identified.
2.2.9 Single-Nucleotide Polymorphisms
DNA sequences can differ at single nucleotide positions within the genome. These SNPs can occur as frequently as 1 in every 1000 base pairs and can occur in both introns and exons. In introns they generally have little effect, but in exons they can affect protein structure and function. For example, SNPs may be involved in altered drug metabolism because of their modifying effect on the cytochrome P450 metabolizing enzymes. They also contribute to disease (eg, SNPs that result in missense mutations) and disease predisposition. Most early methods to characterize SNPs required PCR amplification of the sample to be genotyped prior to sequence analysis; modern methods of gene sequencing and array analyses, however, have largely replaced this older technique. One application of SNPs in cancer medicine has been the use of SNP arrays in genomic analyses. These DNA microarrays, use tiled SNP probes to some of the 50 million SNPs in the human genome to interrogate genomic architecture. For example, SNP arrays can be used to study such phenomena as loss of heterozygosity (LOH) and amplifications. Indeed, the particular advantage of SNP arrays is that they can detect copy-neutral LOH (also known as uniparental disomy or gene conversion) whereby one allele or whole chromosome is missing and the other allele is duplicated with potential pathological consequences.
To characterize the primary structure of genes, and thus of the potential repertoire of proteins that they encode, it is necessary to determine the sequence of their DNA. Sanger sequencing (the classical method) relied on oligonucleotide primer extension and dideoxy-chain termination (dideoxynucleotides (ddNTPs) lack the 3′-OH group required for the phosphodiester bond between 2 nucleosides). DNA sequencing was carried out in 4 separate reactions each containing 1 of the 4 ddNTPs (ie, ddATP, ddCTP, ddGTP, or ddTTP) together with ddNTPs. In each reaction, the same primer was used to ensure DNA synthesis began at the same nucleotide. The extended primers therefore terminated at different sites whenever a specific ddNTP was incorporated. This method produced fragments of different sizes terminating at different 3′ nucleotides. The newly synthesized and labeled DNA fragments were heat-denatured, and then separated by size with gel electrophoresis and with each of the 4 reactions in individual adjacent lanes (lanes A, T, G, C); the DNA bands were then visualized by autoradiography or UV light, and the DNA sequence could be directly interpreted from the x-ray film or gel image (Fig. 2–11). Using this method it was possible to obtain a sequence of 200 to 500 bases in length from a single gel. The next development was automated Sanger sequencing which involved the development of fluorescently labeled-primers (dye primers) and –ddNTPs (dye terminators). With the automated procedures the reactions are performed in a single tube containing all 4 ddNTPs, each labeled with a different fluorescent dye. Since the four dyes fluoresce at different wavelengths, a laser then reads the gel to determine the identity of each band according to the wavelengths at which it fluoresces. The results are then depicted in the form of a chromatogram, which is a diagram of colored peaks that correspond to the nucleotide in that location in the sequence. Then sequencing analysis software interprets the results, identifying the bases from the fluorescent intensities (Fig. 2–12).
Dideoxy-chain termination sequencing showing an extension reaction to read the position of the nucleotide guanidine (see text for details). (Courtesy of Lilly Noble, University of Toronto, Toronto.)
Outline of automated sequencing and thereafter automated sequencing of BRCA2, the hereditary breast cancer predisposition gene. Each colored peak represents a different nucleotide. The lower panel is the sequence of the wild-type DNA sample. The sequence of the mutation carrier in the upper panel contains a double peak (indicated by an arrow) in which nucleotide T in intron 17 located 2 bp downstream of the 5′ end of exon 18 is converted to a C. The mutation results in aberrant splicing of exon 18 of the BRCA2 gene. The presence of the T nucleotide, in addition to the mutant C, implies that only 1 copy of the 2 BRCA2 genes is mutated in this sample.
So-called next-generation sequencing (NGS) uses a variety of approaches to automate the sequencing process by creating micro-PCR reactors and/or attaching the DNA molecules to be sequenced to solid surfaces or beads, allowing for millions of sequencing events to occur simultaneously. Although the analyzed sequences are generally much shorter (~21 to ~400 base pairs) than in previous sequencing technologies, they can be counted and quantified, allowing for the identification of mutations in a small subpopulation of cells which is part of a larger population with wild-type sequences. The recent introduction of approaches that allow for sequencing of both ends of a DNA molecule (ie, paired end massively parallel sequencing or mate-pair sequencing), make it possible to detect balanced and unbalanced somatic rearrangements (eg, fusion genes) in a genome-wide fashion.
There are several types of NGS machines in routine use that fall into 4 methodological categories; (a) Roche/454, Life/APG, (b) Illumina/Solexa, (c) Ion Torrent, and (d) Pacific Biosciences. It is beyond the scope of this chapter to describe these in detail or to foreshadow developing technologies, but an overview of the key differences is provided below.
Each technology includes a number of steps grouped as (a) template preparation, (b) sequencing/imaging, and (c) data analysis. Initially, all methods involve randomly breaking genomic DNA into small sizes from which either fragment templates (randomly sheared DNA usually <1 kbp in size) or mate-pair templates (linear DNA fragments originating from circularized sheared DNA of a particular size) are created.
There are 2 types of template preparation: clonally amplified templates and single-molecule templates. Clonally amplified templates rely on PCR techniques to amplify the DNA so that fluorescence is detectable when fluorescently labeled nucleotides are added. Emulsion PCR (Fig. 2–13) is used to prepare a library of fragment or mate-pair targets and then adaptors (short DNA segments) containing universal priming sites are ligated to the target ends, allowing complex genomes to be amplified with common PCR primers. After ligation, the DNA is separated into single strands and captured onto beads under conditions that favor 1 DNA molecule per bead. After the successful amplification of DNA, millions of molecules can be chemically cross-linked to an amino-coated glass surface (Life/APG; Ion Torrent) or deposited into individual PicoTiterPlate (PTP) wells (Roche/454). Solid-phase amplification (Fig. 2–14) used in the Illumina/Solexa platform produces randomly distributed, clonally amplified clusters from fragment or mate-pair templates on a glass slide. High-density forward and reverse primers are covalently attached to the slide and the DNA segments of interest and the ratio of the primers to the template on the support define the surface density of the amplified clusters. These primers can also provide free ends to which a universal primer can be hybridized to initiate the NGS reaction.
In emulsion PCR (emPCR), a reaction mixture is generated compromising an oil–aqueous emulsion to encapsulate bead–DNA complexes into single aqueous droplets. PCR amplification is subsequently carried out in these droplets to create beads containing thousands of copies of the same template sequence. EmPCR beads can then be chemically attached to a glass slide or a reaction plate. (From Metzker, 2010.)
The 2 basic steps of solid-phase amplification are initial priming and extending of the single-stranded, single-molecule template, and then bridge amplification of the immobilized template with immediately adjacent primers to form clusters. (From Metzker, 2010.)
In general, the preparation of single-molecule templates is more straightforward and requires less starting material (<1 μg) than emulsion PCR or solid-phase amplification. More importantly, these methods do not require PCR, which may create mutations and bias in amplified templates and regions. A variant of this (Pacific Biosciences; see below) uses spatially distributed single-polymerase molecules that are attached to a solid support that analyze circularized sheared DNA selected for a given size, such as 2 kbp, to which primed template molecules are bound.
Cyclic reversible termination (CRT) is currently used in the Illumina/Solexa platform. CRT uses reversible terminators in a cyclic method that comprises nucleotide incorporation, fluorescence imaging and cleavage. In the first step, a DNA polymerase, bound to the primed template, adds or incorporates only 1 fluorescently modified nucleotide, complementary to the template base. DNA synthesis is then terminated. Following incorporation, the remaining unincorporated nucleotides are washed away. Imaging is then performed to identify the incorporated nucleotide. This is followed by a cleavage step, which removes the terminating/inhibiting group and the fluorescent dye. Additional washing is performed before starting another incorporation step.
Another cyclic method is single-base ligation (SBL) used in the Life/APG platform, which uses a DNA ligase and either 1- or 2-base-encoded probes. In its simplest form, a fluorescently labeled probe hybridizes to its complementary sequence adjacent to the primed template. DNA ligase is then added which joins the dye-labeled probe to the primer. Nonligated probes are washed away, followed by fluorescence imaging to determine the identity of the ligated probe. The cycle can be repeated either by (a) using cleavable probes to remove the fluorescent dye and regenerate a 5′-PO4 group for subsequent ligation cycles or (b) by removing and hybridizing a new primer to the template.
Pyrosequencing (used in the Roche/454 platform) (Fig. 2–15) is a bioluminescence method that measures the incorporation of nucleotides by the release of inorganic pyrophosphate by proportionally converting it into visible light using serial enzymatic reactions. Following loading of the DNA-amplified beads into individual PTP wells, additional smaller beads, which are coupled with sulphurylase and luciferase are added. Nucleotides are then flowed sequentially in a fixed order across the PTP device. If a nucleotide complementary to the template strand appears, the polymerase extends the existing DNA strand by adding nucleotide(s). Addition of 1 (or more) nucleotide(s) results in a reaction that generates a light signal that is recorded. The signal strength is proportional to the number of nucleotides incorporated in a single nucleotide flow. The order and intensity of the light peaks are recorded to reveal the underlying DNA sequence.
Pyrosequencing. After loading of the DNA-amplified beads into individual PicoTiterPlate (PTP) wells, additional beads, coupled with sulphurylase and luciferase, are added. The fiberoptic slide is mounted in a flow chamber, enabling the delivery of sequencing reagents to the bead-packed wells. The underneath of the fiberoptic slide is directly attached to a high-resolution camera, which allows detection of the light generated from each PTP well undergoing the pyrosequencing reaction. The light generated by the enzymatic cascade is recorded and is known as a flow gram. PP, Inorganic pyrophosphate. (From Metzker, 2010.)
The method of real-time sequencing (as used in the Pacific Biosciences platform, Fig. 2–16) involves imaging the continuous incorporation of dye-labeled nucleotides during DNA synthesis by attaching single DNA polymerase molecules to the bottom surface of individual wells known as "zero-mode waveguide detectors" that can detect the light from the fluorescent nucleotides as they are incorporated into the elongating primer strand.
Pacific Biosciences' four-color real-time sequencing method. The zero-mode waveguide (ZMW) design reduces the observation volume, therefore reducing the number of stray fluorescently labeled molecules that enter the detection layer for a given period. The residence time of phospho linked nucleotides in the active site is governed by the rate of catalysis and is usually milliseconds. This corresponds to a recorded fluorescence pulse, because only the bound, dye-labeled nucleotide occupies the ZMW detection zone on this timescale. The released, dye-labeled pentaphosphate by-product quickly diffuses away, as does the fluorescence signal. (From Metzker, 2010.)
The Ion Torrent sequencing relies on emulsion PCR amplified particles (ion sphere particles) to be deposited into an array of wells by a short centrifugation step. The sequencing is based on the detection of hydrogen ions that are released during the polymerization of DNA, as opposed to the optical methods used in other sequencing systems. A microwell containing a template DNA strand to be sequenced is flooded with a single type of nucleotide. If the introduced nucleotide is complementary to the leading template nucleotide it is incorporated into the growing complementary strand. This causes the release of a hydrogen ion that triggers a hypersensitive ion sensor, which indicates that a reaction has occurred. If homopolymer repeats are present in the template sequence multiple nucleotides will be incorporated in a single cycle. This leads to a corresponding number of released hydrogens and a proportionally higher electronic signal.
Despite the substantial cost reductions associated with next-generation technologies in comparison with the automated Sanger method, whole-genome sequencing is expensive but the costs are continuing to fall. In the interim, investigators are using the NGS platforms to target specific regions of interest. This strategy can be used to examine all of the exons in the genome, specific gene families that constitute known drug targets, or megabase-size regions that are implicated in disease or pharmacogenetic effects. Methods to perform the initial first step are known as genomic partitioning and broadly include methods involving PCR, or other hybridization methodologies. These are generally hybridized to target-specific probes either on a microarray surface or in solution.
The ability to sequence large amounts of DNA at low-cost makes the NGS platforms described above useful for many applications such as discovery of variant alleles through resequencing targeted regions of interest or whole genomes, de novo assembly of bacterial and lower eukaryotic genomes, cataloguing the mRNAs ("transcriptomes") present in cells, tissues and organisms (RNA–sequencing), and gene discovery.
2.2.11 Variation in Copy Number and Gene Sequence
The recent application of genome-wide analysis to human genomes has led to the discovery of extensive genomic structural variation, ranging from kilobase pairs to megabase pairs (Mbp) in size, that are not identifiable by conventional chromosomal banding. These changes are termed copy-number variations (CNVs) and can result from deletions, duplications, triplications, insertions, and translocations; they may account for up to 13% of the human genome (Redon et al, 2006).
Despite extensive studies, the total number, position, size, gene content, and population distribution of CNVs remain elusive. There has not been an accurate molecular method to study smaller rearrangements of 1 to 50 kbp on a genome-wide scale in different populations. Recent analyses revealed 11,700 CNVs involving more than 1000 genes (Redon et al, 2006; Conrad et al, 2010). Wider application of array CGH techniques and NGS is likely to reveal greater structural variation among different individuals and populations, as the majority of CNVs are beyond the resolving capability of current arrays. There are several different classes of CNVs (Fig. 2–17). Entire genes or genomic regions can undergo duplication, deletion and insertion events, whereas multisite variants (MSVs) refer to more complex genomic rearrangements, including concurrent CNVs and mutation or gene conversions (a process by which DNA sequence information is transferred from one DNA helix, which remains unchanged, to another DNA helix, whose sequence is altered). CNVs can be inherited or sporadic; both types may be involved in causing disease including cancer. However, the phenotypic effects of CNVs are unclear and depend on whether dosage-sensitive genes or regulatory sequences are influenced by the genomic rearrangement.
A) Outline of the classes of CNVs in the human genome. B) The chromosomal locations of 1447 copy number variation regions (a region covered by overlapping CNVs) are indicated by lines to either side of the ideograms. Green lines denote CNVRs associated with segmental duplications; blue lines denote CNVRs not associated with segmental duplications. The length of right-hand side lines represents the size of each CNVR. The length of left-hand side lines indicates the frequency with which a CNVR is detected (minor call frequency among 270 HapMap samples). When both platforms identify a CNVR, the maximum call frequency of the two is shown. For clarity, the dynamic range of length and frequency are log transformed (see scale bars). (From Redon et al, 2006.)
Use of high-resolution SNP arrays in cancer genomes has shown that CNVs are frequent contributors to the spectrum of mutations leading to cancer development. In adenocarcinoma of the lung, a total of 57 recurrent copy number changes were detected in a collection of 528 cases (Weir et al, 2007). In 206 cases of glioblastoma, somatic copy number alterations were also frequent, and concurrent gene expression analysis showed that 76% of genes affected by copy number alteration had expression patterns that correlated with gene copy number (Cerami et al, 2010). High-resolution analyses of copy number and nucleotide alterations have been carried out on breast and colorectal cancer (Leary et al, 2008). Individual colorectal and breast tumors had, on average, 7 and 18 copy number alterations, respectively, with 24 and 9 as the average number of protein-coding genes affected by amplification or homozygous deletions.
Heritable germline CNVs may also contribute to cancer. For example, a heritable CNV at chromosome 1q21.1 contains the NBPF23 gene for which copy number is implicated in the development of neuroblastoma (Diskin et al, 2009). Also, a germline deletion at chromosome 2p24.3 is more common in men with prostate cancer, with higher prevalence in patients with aggressive compared with nonaggressive prostate cancer (Liu et al, 2009). However, how CNVs, either somatic or germline, contribute to cancer development is still poorly understood. Possible explanations come from the Knudson's two-hit hypothesis (Knudson, 1971): tumor-suppressor genes can be lost as a consequence of a homozygous deletion leading directly to cancer susceptibility (see Chap. 7, Sec. 7.2.3). Alternatively, heterozygous deletions may harbor genes predisposing to cancer that become unmasked when a functional mutation arises in the other chromosome resulting in tumor development. Duplications or gains of chromosomal regions may result in increased expression levels of one or more oncogenes. Germline CNVs can provide a genetic basis for subsequent somatic chromosomal changes that arise in tumor DNA.
2.2.12 Microarrays and RNA Analysis
Microarray analysis has been developed to assess expression of the increasing number of genes identified by the Human Genome Project. There are several commercial kits designed to assist with RNA extraction from cells or tissues. The extracted RNA is then usually converted to cDNA with reverse transcriptase, and this may be combined with an RNA amplification step.
The principle of an expression array involves the production of DNA arrays or "chips" on solid supports for large-scale hybridization experiments. It consists of an arrayed series of thousands of microscopic spots of DNA oligonucleotides, called features, each containing specific DNA sequences, known as probes (or reporters). This approach allows for the simultaneous analysis of the differential expression of thousands of genes and has enhanced understanding of the dynamics of gene expression in cancer cells (Fig. 2–18).
A) The steps required in a microarray experiment from sample preparation to analyses. RT, Reverse transcriptase. For details see text. Briefly, samples are prepared and cDNA is created through reverse transcriptase. The fluorescent label is added either in the RT step or in an additional step after amplification, if present. The labeled samples are then mixed with a hybridization solution that contains light detergents, blocking agents (such as COT1 DNA, salmon sperm DNA, calf thymus DNA, PolyA or PolyT), along with other stabilizers. The mix is denatured and added to a pinhole in a microarray, which can be a gene chip (holes in the back) or a glass microarray. The holes are sealed and the microarray hybridized, either in a hybridization oven, (mixed by rotation), or in a mixer, (mixed by alternating pressure at the pinholes). After an overnight hybridization, all nonspecific binding is washed off. The microarray is dried and scanned in a special machine where a laser excites the dye and a detector measures its emission. The intensities of the features (several pixels make a feature) are quantified and normalized (see text). (Reproduced with permission from Jacopo Werther/Wikimedia Commons.) B) The output from a typical microarray experiment, a hierarchical clustering of cDNA microarray data obtained from 9 primary laryngeal tumors. Results were visualized using Tree View software, and include the dendrogram (clustering of samples) and the clustering of gene expression, based on genomic similarity. Tree View represents the 946 genes that best distinguish these 2 groups of samples. Genes whose expression is higher in the tumor sample relative to the reference sample are shown in red; those whose expression is lower than the reference sample are shown in green; and no change in gene expression is shown in black. (Courtesy of Patricia Reis and Shilpi Arora, the Ontario Cancer Institute and Princess Margaret Hospital, Toronto.)
There are a number of microarray platforms in common use. These platforms include: (a) Spotted arrays where DNA fragments (usually created by PCR) or oligonucleotides are immobilized on glass slides. The size of the fragment can be any length (usually 500 bp to 1 kbp) and the size of the oligonucleotides range from 20 to 100 nucleotides. These arrays can be created in individual laboratories using "affordable" equipment. (b) Affymetrix arrays, where the probes are synthesized using a light mask technology and are typically small (20 to 25 bp) oligonucleotides. (c) NimbleGen, the maskless array synthesizer technology that uses 786,000 tiny aluminum mirrors to direct light in specific patterns. Photo deposition chemistry allows single-nucleotide extensions with 380,000 or 2.1 million oligonucleotides/array as the light directs base pairing in specific sequences. (d) Agilent, which uses ink-jet printer technology to extend up to 60-mer bases through phosphoramidite chemistry. The capacity is 244,000 oligonucleotides/array. The analysis of microarrays is discussed in Section 2.7.1.
All the sequencing approaches described in Section 2.2.10 can be applied to RNA, in some cases by simply by converting the RNA to cDNA before analysis. It may also be necessary to remove the ribosomal RNA from the sample to increase the sensitivity of detection. This approach, known as RNA-Seq is becoming increasingly available, although it remains expensive. The technique possesses certain advantages when compared to expression microarrays in that it obviates the requirement for preexisting sequence information in order to detect and evaluate transcripts, and can detect fusion transcripts.