3.2.1 Incidence, Mortality, Case Fatality, and Age-Standardized Incidence Rates
An incidence rate refers to the number of new cases of a disease observed in a defined population in a specific time period divided by the population size, whereas mortality rate refers to the number of deaths caused by the disease during a specific time period divided by the population size. A more precise definition would use the term person time instead of population size. For example, in a town with a population of 40,000 the number of new cases of cancer in a year can be thought of as being divided by 40,000 person-years, 1 year of person time per individual, with the assumption that everyone lived there for the entire year (births, deaths, and migration during the year are ignored). Similarly, the mortality rate would be calculated as the number of deaths from cancer in a year divided by 40,000 person-years.
A direct calculation of incidence rate, as determined in the above example, is termed the crude incidence rate, because the calculation is performed without consideration of other important factors that may differ across different populations. A major factor in the comparison of different populations is the difference in age-distribution across the different populations. Both cancer incidence and mortality generally increase markedly with age, and comparisons of populations with different age distributions using crude rates can therefore be misleading. One can statistically eliminate or reduce the effect of age in the calculation of the incidence of cancer (or of any other disease), and allow comparison of cancer incidence in communities or geographical regions with different age distributions or in the same community with changing age distribution. The procedure requires adjusting (or standardizing) rates so they are representative of the age distribution of some reference population. The reference populations can be chosen arbitrarily to have any age distribution, but typically, the standard population is often conveniently and practically chosen to be the age distribution at a particular census date of the relevant country (eg, U.S. year 2000 census). Different age-standardized rates can then be compared to each other, if they are adjusted to the same reference standard. Figures 3–1 and 3–2 show examples of age-standardized cancer incidence.
Comparison of age-standardized incidence rate of selected cancers in selected countries, by sex, per 100,000 person-years. X-axis is divided into different continents: CAN, Canada; US-CAUC, United States caucasians; US-AA, United States African American; SWE, Sweden; GER, Germany; JAP, Japan; CHN, China (Shanghai Registry); IND, India; UGA, Uganda; EGY, Egypt (Gharbiah Registry). Z-axis: dark blue, gastroesophageal cancer; red, colorectal cancer; green, liver cancer; purple, lung cancer; light blue, prostate cancer (men), breast cancer (women).
Age-standardized incidence and mortality rates across 20 years, and incidence, death and prevalence estimates for the year 2000, Canada. (Top) Age-standardized incidence (solid lines) and mortality (dashed lines) rates for lung (blue), colorectal (green), breast (dark red), and prostate cancer (bright red) in Canada, per 100,000 person years for years 1981 to 2010. (Bottom Table) For the year 2000, incident cases, cancer-specific deaths, and number of prevalent cases. *One-year prevalence, 2000. (Data from Canadian Cancer Statistics, 2010.)
The age-standardized rate is commonly presented in publications as it accounts for differences in age between populations or changes in age within populations over time. However, such adjusted rates should be viewed as relative indices rather than actual measures of occurrence. Rates can also be compared within age groups.
The age-specific incidence rate is defined as the incidence of disease in a specific age group; typically, 5-year age groups (0 to 4, 5 to 9, 10 to 14, etc) are used. This method is informative about the disease pattern over the life course, but a valid comparison across populations can only be made within age groups.
The case fatality rate is the number of deaths caused by a specific disease in a defined population divided by the number of individuals who have been diagnosed with that particular disease, in a fixed time interval. Only deaths among those diagnosed in the time interval are included in the numerator. An example of a case fatality rate is found with breast cancer, which in the Western world is around 25 individuals per 100, or 25%. This means that within a given year, of every 100 individuals carrying a diagnosis of breast cancer in a population, 25 individuals in the same population will have died of breast cancer. Contrast this result with pancreatic cancer, where the case fatality rate is more than 95%; that is, of 100 individuals carrying a diagnosis of pancreatic cancer, more than 95 patients will have died.
Prevalence is defined as the proportion of a population that has a disease at a specific time point (prevalent cases divided by population size), where prevalent cases include both new cases of disease and the number of previously diagnosed cases who are still alive in a population. Including all cases who are still alive as prevalent cases assumes that no cases can be considered cured. This raises the question of whether long-term survivors of cancer should be included in the prevalence group when calculating prevalence. Given the available data (which may not include treatment information), decisions should be made that ensures the prevalence calculation reasonably reflects the burden of disease in the population. For example, one can define prevalent cases as those diagnosed and still alive at the end of a specified time period (eg, the last 5 years).
Prevalence is a function of both cancer incidence and cancer survivorship (ie, how long the person lives with the disease or disease chronicity). As an example, although lung cancer ranks among the most common cancers in the Western world, its prevalence ranks lower than that of prostate cancer, breast cancer, and colorectal cancer, because lung cancer is such a lethal disease. Both cancer incidence and cancer prevalence are important measures. Cancer incidence reflects how commonly cancer develops in a population, and the impact of preventive measures and health service utilization related to initial diagnosis and treatment. In contrast, cancer prevalence may be important when considering the overall burden of a cancer on a global health system, and longer-term impact on survivors.
Figure 3–2 shows the relationship between incidence, mortality, and prevalence. In lung cancer, the dashed (mortality rate) line approaches the solid (incidence rate) lines, which represents a low survival rate (of approximately 10% to 20%). The low survival rate results in a low prevalence of disease in the population, as a high proportion of lung cancer patients die of their disease, resulting in few long-term survivors. For prostate, breast, and colorectal cancers, the age- standardized incidence rates are substantially higher than their corresponding mortality rates, as a result of high survival rates (all greater than 40%). In the figure, both the age-standardized incidence of prostate cancer and the incidence–mortality gap for prostate cancer in men was similar to the same indicators for breast cancer in women. Because prevalence is a function of incidence and survivorship, one would have expected a similar prevalence of prostate cancer and breast cancer patients in 2000. Yet there was approximately 50% more prevalence in breast cancer patients when compared to prostate cancer patients. The reasons for this finding are (a) women live longer than men, on average, and (b) the median age of diagnosis of breast cancer is lower than that of prostate cancer. Thus, in the year 2000, a breast cancer patient can expect to live longer, on average, than a prostate cancer patient, leading to a greater prevalence of breast cancer patients when compared to prostate cancer patients (see Fig. 3–2). This aspect of survivorship links incidence, mortality, case fatality rates, and survivorship characteristics with prevalence rates.
3.2.3 The Role of Sampling
A major focus of epidemiology is to identify associations that are true for an entire population. Optimally, one would collect exposure and disease status data from every member of that population. If the information collected is accurate, then any associations found would be true.
In the real world, it is not feasible to collect data from the entire population, and a subset of the population is studied. Even the most comprehensive national census data from countries that make completing a census mandatory will have certain individuals refusing to comply (typically the disenfranchised, those not in the country legally, and any groups that are suspicious of government oversight). In many cases, because of cost and feasibility, basic census data are collected from as many individuals as possible, while detailed comprehensive information is collected from a subset of the population. Sampling is therefore a key component of epidemiological analyses. The goal of sampling is to evaluate a subset of the population where the exposure and disease status information is representative of the underlying population. In an ideal setting, the results found in the sample should fully reflect the true associations in the underlying population. When the results are different, bias and measurement errors may explain these discrepancies.
A study is biased if the results are different than the truth. In epidemiology, bias can be viewed as a distortion of risk estimates from their true values. Bias can be related to the identification of cases, measurement of exposure, improper analysis of results, or systematic errors in data collection and entry. Many different kinds of biases have been described (Sackett, 1979; Szklo and Nieto, 2007). We describe the main types of bias found in observational studies, including confounding, selection, and information bias.
188.8.131.52 Bias Because of Confounding
An important bias in observational studies comes from confounding, defined as the distortion of effect of an exposure on risk (of disease or outcome) that arises because of an association with other factors that affect such a risk. Confounding can lead to spurious associations, mask associations that are real, or distort the strength of an association. A variable is considered to be a confounder if it is associated with the potential disease-related factor under investigation (either causally or noncausally) and is causally related to the outcome of interest (either risk of disease or its outcome). An example of a confounder is smoking in lung cancer (Fig. 3–3A). Suppose we are studying the association between tooth loss and lung cancer risk. Tooth loss, a marker of poor hygiene, is strongly associated with heavy smoking. We may therefore find an association between tooth loss and lung cancer solely because both are associated with heavy smoking. In reality, tooth loss does not lead to lung cancer development, but its association with smoking makes it appear that it is related with lung cancer risk, while the true association is between smoking and lung cancer.
Confounding in epidemiologic studies. The confounder is related to both the exposure of interest and to either disease or outcome. Example A reflects confounding by smoking. Example B is not an example of confounding because bronchial dysplasia lies in the causal pathway leading to lung cancer.
A variable is not considered to be a confounder if it lies in the same causal pathway as the potential disease-related factor under investigation. For example, bronchial dysplasia is an intermediary in the pathway between smoking and lung cancer, and is thus not a confounder (see Fig. 3–3B).
In summary, there are 3 criteria for a variable to be a confounder:
A confounding factor must be a risk factor for the disease;
A confounding factor must be associated with the exposure under study in the source population; and
A confounding factor should not be an intermediate factor in a causal path between exposure and disease.
Confounding can be dealt with in different ways. Individuals who have a disease (eg, cancer cases) and those without disease (eg, healthy controls) can be matched on potential confounding exposure variables (eg, age and sex are commonly matched in a case-control study) to reduce or eliminate confounding by these variables, or data can be analyzed within specific strata of the confounding variable (eg, analyses stratified by ethnic group). In addition, one can control for confounding using multiple regression analysis, which is discussed in Section 3.3.3.
184.108.40.206 Selection and Information Bias
Sometimes the results of analyses of a sample will differ from the true associations in the underlying population. This may be a result of sampling problems, whereby the sample selected does not represent the underlying population. Selection bias refers to systematic differences between those who participated in the study versus those who should be theoretically eligible for the study (including those who do not participate). An example of sampling bias results from recruiting cases from a surgical clinic to represent the entire population of stomach cancer. Because surgeons generally see more early stage patients (ie, those who are operable), the population will be skewed toward earlier stage patients, where as the whole population of patients with stomach cancer is eligible for the study.
Information bias occurs as a consequence of errors in obtaining the needed information, which is often termed misclassification or classification error. Sometimes these misclassifications can lead to results from studies that do not represent the true associations in the underlying population. An example of information bias occurs if lung cancer patients overestimate their exposures to asbestos (compared with healthy controls) while underestimating their own cigarette smoking history (perhaps as a means of reducing their own culpability in developing this disease). The resultant effect is a smaller-than-true risk associated with cumulative smoking, and an exaggerated risk associated with asbestos. This bias particularly affects case-control studies (see Sec. 3.4.3) where cases are recruited after their diagnosis, and is referred to as recall bias. Such a bias would be absent if individuals were asked for their exposure status prior to developing their cancer (as in the case of cohort studies; see Sec. 3.4.2). Another example of information bias may come from evaluating a molecular test. Assume that the molecular test categorizes individuals into 3 levels: A, B, and C. However, because the test is inappropriately calibrated, a number of B test results are misclassified as C results, whereas B results are never misclassified as A results. The resultant error is directional in nature (ie, nonrandom).
Although systematic error such as selection bias and differential misclassification can generate biases in epidemiological studies, random error can also distort the results of epidemiological studies. Random error is the deviation that arises by chance between the observed value (in the sample) and its true value (in the underlying population). The greater the random error, the less precise the result.
Assessment of cancer diagnosis should be reasonably accurate as diagnosis is generally verified with a pathology report. The determination of cause of death can be more problematic if death certificates are used. Assessment of exposure can be particularly problematic, and misclassification of subjects with respect to their exposure can be extensive. Recall of certain past exposures, such as diet, may show considerable random error. Error in the measurement of biomarkers depends not only on the accuracy of the bioassay, but how well a single measurement may reflect long-term levels of the biomarker. The latency period for cancer can be many years and a single measurement of a biomarker during an individual's lifetime may not effectively represent long-term levels.
Generally, misclassification of a dichotomous (ie, positive or negative) exposure will lead to a bias toward the null (relative risk estimates will indicate a smaller association than actually exists or indicate no association when true association is present), and when the misclassification is extreme the result can go beyond null to the opposite direction. However, misclassification of a multilevel exposure could lead to errors in any direction. Efforts to increase the accuracy of assessment of exposure or use of large samples that can detect the attenuated associations are the only ways to address this problem. We discuss some of the newer strategies in Section 3.7.2. Table 3–1 defines and describes other common examples of selection and information biases.
TABLE 3–1Types of selection and information biases in epidemiological studies. ||Download (.pdf) TABLE 3–1 Types of selection and information biases in epidemiological studies.
|Bias* ||Study Design ||Description |
|Selection Bias |
|Admission rate (Berkson) bias ||Hospital-based case-control studies ||Admission rate of cancer patients differs with respect to exposure to potential disease-related factor under investigation. Exposed cases may be over- or underrepresented in sample. |
|Prevalence-incidence (or length or survival) bias ||Cross-sectional studies, case-control studies of rapidly fatal cancers ||Survival of cases is related to exposure. Exposed cases may be over- or underrepresented in sample. |
|Detection bias ||Case-control studies ||Detection of cases is related to exposure to potential disease-related factor, with cases in exposed group over- or underrepresented. |
|Bias related to selection of cases and controls from different catchment areas ||Hospital-based case-control studies ||Cancer patients that visit hospital arise from different region than controls selected among other patients. Distribution of potential disease-related factor may differ in the 2 underlying populations. |
|Sampling or ascertainment bias ||All types ||Some members of the population may be less likely to be included than others, resulting in a nonrandom sample. |
|Differential loss to follow-up ||Cohort studies, survival analyses ||Subjects with exposure are either more or less likely to be lost to follow-up (losses can be a result of mortality, migration, or refusal to continue with study). |
|Lead-time bias ||Cohort studies, survival analyses ||The appearance of prolonged survival as a result of earlier diagnoses because of earlier detection of the disease, without impacting actual outcome of treating the disease. |
|Overdiagnosis bias ||All types ||The appearance of increase in early stage disease with improved survival, because of new detection technologies that identify previously undiagnosed subclinical disease that would otherwise never have required treatment. |
|Information Bias |
|Recall bias ||Case-control studies (with interview/questionnaires) ||Cases recall exposure differently than controls, either over- or underreporting exposure relative to controls. |
|Interviewer or experimenter's bias ||Case-control studies ||Interviewer/experimenter knows disease status of study subjects and over- or underreports exposure, either consciously or unconsciously affecting the results. |
220.127.116.11 Bias in Cancer Screening
Two special causes of bias are related to cancer screening. These biases can affect incidence, prevalence, mortality, and survival rates. For example, as Figure 3–2 shows, prostate cancer incidence rates had 2 separate peaks (1993 and 2001). Each peak was related to the clinical adoption of a screening test based on the serum levels of prostate-specific antigen (PSA). The first peak followed initial adoption of the PSA test, while the second peak may be explained by increased PSA testing related to the publicity around a prominent Canadian politician's prostate cancer diagnosis (Canadian Cancer Society, 2010). New screening techniques may result in individuals with subclinical prostate cancer being diagnosed earlier than traditionally expected. In the absence of the screening, the prostate cancers would be detected at a later date, when the subclinical cancer grows large enough to produce symptoms and be detected using previous methods. At the time of clinical adoption of such screening, survival may be prolonged because there is a true benefit of screening and the cure rate has truly risen. However, 2 potential biases complicate the interpretation of findings. Lead-time bias and overdiagnosis bias may have accounted partly for these peaks.
Lead-time bias refers to the appearance of longer survival after diagnosis that is a result of diagnosis at an earlier time during the course of the disease, and thus a longer time that the patient is known to have the cancer rather than an improved treatment response. The fact that early detection may not necessarily benefit the patient clinically, because the patient may have died at the same time with or without screening, is an important consideration when evaluating cancer screening programs. In the context of screening, length time bias occurs when screened subjects with better prognosis are detected by a screening program. This can result from more rapidly growing (and more lethal) cancers being diagnosed outside of the screening program, thus leading to an impression of better survival among screened subjects (see Chap. 22, Sec. 22.3.3).
Screening can also lead to overdiagnosis bias. PSA screening may lead to the detection of subclinical cases of prostate cancer that would never have become clinically diagnosed in individuals who would have eventually died from an unrelated cause. Nonetheless, overdiagnosis results in increase in cancer incidence, apparent prolonged survival after diagnosis (and, therefore, apparent decrease in case fatality rates), and greater prevalence of the disease, all as a result of the new detection of previously subclinical disease that has no real clinical relevance.
3.2.5 Geographic Variation
Geographic variations in cancer incidence can be a result of differences in prevalence of the underlying causes including environmental and ethnic (ie, genetic) differences, or to differences in diagnostic criteria. In addition, geographic comparison can be complicated by differences in screening, which, by detecting occult disease, usually has a much larger effect on the incidence of disease than on mortality (see Chap. 22, Sec. 22.3.3). Figure 3–1 shows the age-standardized incidence rate for selected cancer sites and countries. Some large variations can be observed across countries, and there may also be large variations within countries: for example, the rate of esophageal cancer varies by 10-fold within Iran (Saidi et al, 2000). In another example, shown in Figure 3–4, there is substantial geographic variation in incidence rates of liver cancer (Ferlay et al, 2010). The highest incidence rates are observed in sub-Sahara Africa and Asian countries such as China (~25 per 100,000 person years), Thailand (~30 per 100,000 person years), and Taiwan (~35 per 100,000 person years), whereas lower rates are observed in Europe and North America. This variation is partially accounted for by the prevalence of chronic infection with hepatitis B and C virus (HBV and HCV), which are causally associated with 80% to 95% of liver (hepatocellular) cancer (Maupas and Melnick, 1981). Similarly, the variation of cervical cancer can be partially accounted for by the prevalence of human papilloma virus (HPV), as we now know that cervical cancer is associated strongly with a few of the oncogenic genotypes of HPV (Munoz et al, 2003). Infection and cancer is described in more detail in Section 3.5.1 (see Chap. 6, Sec. 6.2.3).
Global variation in incidence of liver cancer. Annual age-standardized incidence rate, per 100,000 person years, across different countries of liver cancer. (From Ferlay et al, 2010 with permission.)
Figure 3–2 shows the age-standardized incidence rate (ASIR) of the most common cancer sites in Canada for males and females in the last 20 years. Lung cancer incidence rates in men have been decreasing steadily since mid-1980s from approximately 90 to 65 per 100,000 person years in 2010, whereas the lung cancer incidence rate in women continues to rise from approximately 25 per 100,000 to 48 per 100,000 person years in 2010 (Canadian Cancer Society, 2010). The long-term projection suggests that this trend is beginning to level off. This pattern corresponds to the patterns of tobacco consumption in men and women with a lag time of approximately 20 years. In contrast, colorectal cancer rates have remained relatively stable over the same period. Breast cancer incidence has slightly increased during this period. Changes over time for prostate cancer incidence rates have already been discussed in Section 18.104.22.168.
Worldwide, the incidence rate of stomach cancer in men has been decreasing in the last 30 to 40 years. Regardless of the steady decline, stomach cancer was still the fourth most common incident cancer worldwide in 2008 following cancer of lung, breast, and colorectum (Ferlay et al, 2010). In contrast, the incidence of thyroid cancer is increasing most rapidly among all cancers and it has doubled in women in the last 10 years, in both Europe and parts of the United States (Lundgren et al, 2003; Davies and Welch, 2006). The increase in incidence of thyroid cancer is mainly observed for papillary thyroid cancer and it may be a result of the change in the morphological recognition of this tumor (Lundgren et al, 2003). More frequent use of medical imaging may also contribute to the increased detection of early stage, asymptomatic cancers. The mortality of thyroid cancer did not show any increase during the same period of time.
In addition to adult cancers, recent publications based on the Automated Cancer Information System focused on childhood cancer have provided detailed statistics of major childhood cancer in Europe between 1978 and 1997 (Kaatsch et al, 2006). This analysis, based on 33 cancer registries in 15 European countries, showed an increased rate of childhood cancer in all regions for the majority of tumor types, including soft-tissue sarcoma (annual rate of increase 1.8%), brain tumors, tumors of the sympathetic nervous system, germ cell tumors, and leukemias (annual rate of increase 0.6%). Diagnostic methods can only partially explain the upward trend, and factors such as changing lifestyle and environmental exposures may be important.