Skip to Main Content

KEY CONCEPTS

  • Machine learning is aimed at developing computer algorithms and statistical models to discover patterns underlying complex and large datasets (unsupervised learning) or to predict an outcome based a set of predictors and a decision rule (supervised learning).

  • Deep learning (DL) is a family of machine learning methods based on multilayered artificial neural networks and has been shown to outperform classical supervised learning methods in many applications; however, it requires a large training dataset (at least thousands of samples) because of its model complexity.

  • Big data in oncology mainly arise from cancer genomics (such as The Cancer Genome Atlas project), medical imaging (such as magnetic resonance imaging and positron emission tomography–computed tomography), as well as the electronic health record.

  • Integrative genomic analysis is a powerful tool to study the biological mechanisms underlying a complex disease, such as cancer, across multiplatform high-dimensional data, such as DNA methylation, copy number variation, and gene expression.

  • Radiomics, an emerging field in imaging analysis for diagnosis, prognosis, or treatment response prediction, refers to the extraction of a large number of quantitative features, including shape features, first-order histogram features, and second-order texture features that capture spatial arrangement of the voxel intensities.

  • Assessment of prediction model performance should be based on the test error, that is, the prediction error over an independent test sample. On the other hand, minimizing the training error, that is, the prediction error over the sample used to train a model, will lead to overfitting and over-optimism about the model performance. Test error can be objectively evaluated by proper training/test data split or cross-validation.

INTRODUCTION

The term artificial intelligence (AI) was first coined by John McCarthy for the famous workshop Dartmouth Summer Research Project on Artificial Intelligence held in 1956,1 defined as “the science and engineering of making intelligent machines.”2 As a subfield of AI, machine learning is the study of mathematical and statistical approaches that automatically improve the performance of computers through experience.3 Deep learning (DL), a revolutionary family of machine learning methods based on multilayered artificial neural networks (ANN),4 has caused a third boom of AI and is integrated into our daily lives in ways such as face recognition,5 self-driving cars,6 personal electronic assistants (Siri, Alexa, and Google Assistant), and intelligent medical diagnostics.7,8

MACHINE LEARNING

Machine learning, also referred to as statistical learning in the field of statistics, is aimed at developing computer algorithms and statistical models to discover patterns underlying complex and large data sets (unsupervised learning) or to predict an outcome based a set of predictors and a decision rule (supervised learning).9 The development of machine learning has been bolstered by the big data and emerging analytics challenges in medicine, in particular, oncology, because of the breakthroughs in high-throughput technologies, such as microarray profiling technologies in the late 1990s and early 2000s,10,11 next-generation sequencing technologies in ...

Pop-up div Successfully Displayed

This div only appears when the trigger link is hovered over. Otherwise it is hidden from view.