TY - CHAP M1 - Book, Section TI - Big Data and Machine Learning in Oncology A1 - Wei, Peng A1 - Shu, Hai A2 - Kantarjian, Hagop M. A2 - Wolff, Robert A. A2 - Rieber, Alyssa G. Y1 - 2022 N1 - T2 - The MD Anderson Manual of Medical Oncology, 4e AB - KEY CONCEPTSMachine learning is aimed at developing computer algorithms and statistical models to discover patterns underlying complex and large datasets (unsupervised learning) or to predict an outcome based a set of predictors and a decision rule (supervised learning).Deep learning (DL) is a family of machine learning methods based on multilayered artificial neural networks and has been shown to outperform classical supervised learning methods in many applications; however, it requires a large training dataset (at least thousands of samples) because of its model complexity.Big data in oncology mainly arise from cancer genomics (such as The Cancer Genome Atlas project), medical imaging (such as magnetic resonance imaging and positron emission tomography–computed tomography), as well as the electronic health record.Integrative genomic analysis is a powerful tool to study the biological mechanisms underlying a complex disease, such as cancer, across multiplatform high-dimensional data, such as DNA methylation, copy number variation, and gene expression.Radiomics, an emerging field in imaging analysis for diagnosis, prognosis, or treatment response prediction, refers to the extraction of a large number of quantitative features, including shape features, first-order histogram features, and second-order texture features that capture spatial arrangement of the voxel intensities.Assessment of prediction model performance should be based on the test error, that is, the prediction error over an independent test sample. On the other hand, minimizing the training error, that is, the prediction error over the sample used to train a model, will lead to overfitting and over-optimism about the model performance. Test error can be objectively evaluated by proper training/test data split or cross-validation. SN - PB - McGraw Hill Education CY - New York, NY Y2 - 2024/03/28 UR - hemonc.mhmedical.com/content.aspx?aid=1190840793 ER -