There are two closely related processes that introduce bias into the comparison of exposed and unexposed subjects in cohort studies. When there is a distortion due to the natural distribution of exposures in the population, the mixing of effects is referred to as confounding. When there is a distortion because of the way in which our study groups were constituted, it is referred to as selection bias. In our hypothetical cohort study of dietary fat intake and prostate cancer, we may find that the highest consumers of dietary fat tend to be less physically active than those who consume lower amounts of dietary fat. To the extent that physical activity influences the risk of disease, confounding would be present not because we have chosen the groups in some faulty manner, but simply because these attributes go together in the study population. In contrast, if we chose our high dietary fat consumers from the labor union retirees, and identified low fat consumers from the local Sierra Club, men who are quite likely to be physically active, there would be selection bias that results in part from the imbalance between the two groups with respect to physical activity, but also quite possibly through a range of other less readily identified characteristics.
Confounding tends to be the focus when the source of non-comparability is measurable at least in principle and can therefore be adjusted statistically. To the extent that the source of non-comparability can be identified, whether it arises naturally (confounding) or as the result of the manner in which the study groups were chosen (selection bias), its effects can be mitigated by statistical adjustment. When the concern is with more fundamental features of the groups to be compared and seems unlikely to be resolved through measurement of co-variates and statistical control, we usually refer to the consequence of this non-comparability as selection bias.
The potential for selection bias depends entirely on the specific exposures and diseases under investigation, since it is the relation between exposure and disease that is of interest. Groups that seem on intuitive grounds to be non-comparable could still yield valid inferences regarding a particular exposure and disease, and groups that seem as though they would be almost perfectly suited for comparison could be problematic. Similarly, there are some health outcomes that seem almost invariant with respect to the social and behavioral factors that influence many types of disease and other diseases subject to a myriad of subtle (and obvious) influences.
For example, incidence of acute lymphocytic leukemia in childhood varies at most modestly in relation to social class, parental smoking, or any other exposures or life circumstances examined to date (Chow et al., 1996). If we wished to assess whether the incidence of childhood leukemia in the offspring of men who received therapeutic ionizing radiation as treatment for cancer was increased, the selection of an unexposed group of men might be less daunting since the variability in disease incidence appears to be independent of most potential determinants studied thus far. That is, we might be reasonably confident that rates from general population registries would be adequate or that data from men who received medical treatments other than ionizing radiation would be suitable for comparison. In other words, the sources of non-comparability in the exposed and unexposed populations are unlikely to have much effect, if any, on the acute lymphocytic leukemia rates in the offspring.
In contrast, if we were interested in neural tube defects among the offspring of these men, we would have to contend with substantial variation associated with social class (Little & Elwood, 1992a), ethnicity (Little & Elwood, 1992b), and diet (Elwood et al., 1992). The same exposure in the same men would vary substantially in vulnerability to selection bias depending on the outcome of interest and what factors influence the risk of that outcome. Selection bias is a property of a specific exposure-disease association of interest, not an inherent property of the groups.
Despite the danger of relying solely on intuition, we often start with intuitive notions of group comparability based on geography, demographic characteristics, or time periods. Do the exposed and unexposed groups seem comparable? Social or demographic attributes are related to many health outcomes, so achieving comparability on these indicators may help to reduce the chance that the groups will be non-comparable in disease risk. If location, time period, and demography effectively predict comparability for a wide range of other unmeasured attributes, then the similarity is likely to be beneficial, on average, even if it provides no absolute assurance that the many unmeasured factors that might distinguish exposure groups are also balanced.
Sociodemographic or geographic comparability helps to ensure balance with respect to many known and unknown determinants of disease, but does non-comparability with regard to sociodemographic or other broad characteristics make it likely that selection bias is present? The answer depends entirely on the exposure and disease outcomes of interest and whether adjustment is made for readily identified determinants of disease risk. In fact, the more general question of whether imbalance between groups matters, i.e., whether it introduces bias, is most conveniently interpreted as a question of whether the imbalance introduces confounding. Is the attribute on which the groups are non-comparable associated with exposure, whether naturally (as in confounding) or due to the investigator's methods of constituting the study groups (as in selection bias)? Is the attribute associated with the disease of interest, conditional on adjusting for measured con-founders? Just as natural imbalance on some attributes does not introduce confounding and imbalance in others does, some forms of inequality in the constitution of the study groups can be ignored and other forms cannot.
Continuing to focus on whether the selection of study groups introduces confounding, some sources of non-comparability are readily measured and controlled, such as gender or age, whereas others are quite difficult to measure and control in the analysis, such as health-care seeking or nutrient intake. The challenges, discussed in more detail in the chapter on confounding (Chapter 7), are to anticipate, measure, and control for factors that are independently related to exposure and disease. Whether non-comparability between the exposure groups based on measurable attributes is viewed as confounding or selection bias is somewhat arbitrary in cohort studies. In general, epidemiologists pay little attention to asking why the exposure and the confounder are associated, only asking whether they are. A true confounder could be controlled or exacerbated by the manner in which study groups are selected, as in matching (Rothman, 1986). What is critical to evaluating selection bias is to recognize that if the sources of potential bias can be measured and controlled as confounding factors, the bias that they introduce is removed. Some forms of selection that lead to non-comparability in the study groups can be eliminated by statistical adjustments that make the study groups comparable.
Was this article helpful?