The introduction of next-generation sequencing platforms, coincident with genome-scale preparatory and analytical approaches and the completion of the Human Genome Reference, has ushered in the era of genomics. This chapter introduces the fundamentals of next-generation sequencing methods, provides an overview of the basics of data analysis, and explores the myriad applications developed to exploit the scale and throughput of next-generation sequencing toward questions of biomedical importance. Specifics of cancer genomics, complex disease genomics, and how they pertain to hematologic basic science and clinical practice are discussed, along with the modern-day realities of the consenting process.
Acronyms and Abbreviations:
AML, acute myeloid leukemia; ATAC-seq, uses the hyperactive Tn5 transposase to simultaneously fragment and add sequencing adaptors to accessible DNA; bp, base pair; ChIP-seq, chromatin immunoprecipitation sequencing; ddNTP, di-deoxynucleotide triphosphate; DNase-seq, uses DNase I to fragment DNA based on DNase I hypersensitive sites as a marker of chromatin accessibility; dNTP, deoxynucleotide triphosphate; FAIRE-seq, formalin crosslinking of DNA to proteins prior to random fragmentation; FFPE, formalin-fixed, paraffin embedded; FLT3-ITD, internal tandem duplications of FLT3 gene; Gb, gigabase, i.e., billion base pairs; GINA, The Genetic Information Nondiscrimination Act; GWAS, genome-wide association study; lncRNA, long noncoding RNA; indel, term for the insertion or the deletion of bases; MDS, myelodysplastic syndromes; miRNA, microRNA; MNase-seq, micrococcal nuclease (MNase) determines nucleosomal footprints and boundaries by pairing with NGS as a marker of chromatin accessibility; MRD, minimal residual disease; NGS, next-generation sequencing; PCR, polymerase chain reaction; RNA-seq, RNA sequencing; siRNA, short-interfering RNA; SNP, single nucleotide polymorphism; snoRNA, small nucleolar RNA; snRNA, small nuclear RNA; Tb, terabase, i.e., trillion base pairs; WGBS, whole-genome bisulfite sequencing; ZMW, zero-mode waveguide.
HISTORY OF GENOMICS: SANGER SEQUENCING
The scientific discipline known as genomics has dramatically changed since the publication of the Human Reference Genome in 2003, primarily as a result of the introduction and broad-based implementation of new sequencing technologies.1 Prior to the mid-2000s, Sanger sequencing was the predominant DNA sequencing approach, and was used to complete the sequencing of the first human reference genome. Frederick Sanger and his colleagues developed Sanger or “chain termination” sequencing in the late 1970s.2 In their original method, four reactions were used to accomplish chain termination by incorporating separate di-deoxynucleoside triphosphates (ddNTPs), each included with a mix of three unmodified deoxynucleoside triphosphates (dNTPs) and a fourth, radiolabeled dNTP. Each reaction consisted of the DNA template to be sequenced in a mixture containing a DNA primer, a DNA polymerase, a mixture of four dNTPs, and one of the four ddNTPs. Here, the chemistry of ddNTPs, which lack the 3′ hydroxyl group present in a native dNTP, resulted in chain termination when incorporated into a growing DNA chain, as DNA polymerase cannot add another nucleoside without the 3′ hydroxyl group present. With multiple rounds of primer elongation, the ddNTPs incorporate randomly in the newly synthesized strands according to the complementary nucleotides of the DNA ...