## Quantification Of Potential Confounding

The conceptual underpinnings of confounding concern counterfactual comparisons and exchangeability, but the focus in conducting and analyzing studies is on how much distortion confounding has produced in the measure of effect with what probability. We would like to know what the unconfounded measure is, and therefore wish to estimate how deviant the observed measure of effect is likely to be relative to that unconfounded value. Equivalently, the goal is to estimate the magnitude of confounding. If we obtain a risk ratio of 1.5 relating coffee drinking to the risk of bladder cancer, and have not made adjustments for cigarette smoking or are concerned that we have not fully adjusted for cigarette smoking, we would like to be able to estimate how much of an impact the confounding might have relative to the unknown unconfounded measure of interest. How probable is it that the unconfounded measure of the risk ratio is truly 1.4 or 1.0 or 0.7? In the context of randomized exposure assignment, the probability of obtaining an aberrant allocation of subjects can be considered equivalent to the probability of confounding of a given magnitude, and the formal tools of statistical inference have direct applicability (Greenland, 1990). In contrast, in observational studies in which exposure is not randomly allocated, this assessment is based on informed speculation, quantitative if possible, but hypothetical in nature.

To move forward in understanding, controlling, and estimating the magnitude of uncontrolled confounding, specific sources of the confounding must be hypothesized. There is little benefit to noting that exposed and unexposed groups may differ in baseline disease risk for unspecified reasons. While this is always true, it is of no value given that without further specification the statement just constitutes an inherent feature of observational studies and to some extent, a feature of studies in which exposure is randomized. Instead, the basis for the confounding must be hypothesized in measurable terms to be useful in the interpretation of potential causal associations.

The magnitude of confounding due to an extraneous variable is a function of two underlying associations, namely that of the confounding variable with exposure and the confounding variable with disease. A full algebraic description of both of these associations predicts the direction and magnitude of confounding that the extraneous variable will produce. If both associations are fully known and quantified, the confounding can be measured and removed, which is the purpose of stratified or regression analyses, in which the confounder-exposure association is eliminated.

In general, the magnitude of both the confounder-exposure and confounder-disease associations must be considered to assess the extent of confounding, not just one alone. At the extremes, however, meaningful inferences can be made based on knowledge regarding one of those associations. If there is either no con-founder-exposure or no confounder-disease association present, that is, the potential confounding variable is not related to disease or it is not related to exposure, the magnitude of the other association is irrelevant: no confounding could possibly be present. If speculating about confounding factors in the smoking-lung cancer association, one might initially (and naively) ask about match carrying as a potential confounder given that it is such a strong correlate of tobacco smoking. We would find that carrying matches has no independent relation to lung cancer, however, and in the absence of an association with lung cancer, it cannot possibly be a confounder of the association between smoking and lung cancer. Similarly, there are clearly genetic factors that predispose to the development of lung cancer, but if it could be demonstrated that the distribution of those genetic factors were completely unrelated to cigarette smoking, a hypothesis to be tested empirically and not one to be casually dismissed as implausible, then the genetic factor could not confound the smoking-lung cancer association.

The other extreme case in which either the confounder-exposure or con-founder-disease association yields definitive information regardless of the other is when the potential confounding variable is completely associated with exposure or disease. Regardless of the magnitude of the other association, when there is complete overlap of the confounder with exposure or disease, there is no opportunity to isolate the component of the association due to the exposure of interest from the association due to the confounding factor. In the above example, imagine that coffee drinkers were always smokers, and that the only non-coffee drinkers were never smokers. We would be unable to extricate the effect of coffee drinking from that of smoking and vice versa, even though in theory the observed association with disease may be wholly due to one or the other or partially due to both. One exposure might well confound the other but there would be no opportunity to measure or control that confounding. Similarly, if some condition is completely predictive of disease, such as exposure to asbestos and the development of asbestosis, then we cannot in practice isolate that exposure from others. We cannot answer the question, "Independent of asbestos exposure, what is the effect of cigarette smoking on the development of asbestosis?" The confounder-disease association is complete, so that we would be able to study only the combination and perhaps consider factors that modify the association.

In practice, such extreme situations of no association and complete association are rare. Potential confounding variables will more typically have some degree of association with both the exposure and the disease and the strength of those associations, taken together, determines the amount of confounding that is present. In examining the two underlying associations, the stronger association puts an upper bound on the amount of confounding that could be present and the weaker association puts a lower bound on the amount of confounding that is plausible. If one association is notably less well understood than the other, some inferences may still be possible based on estimates for the one that is known.

In practice, much of the attention focuses on the confounder-disease association, given that this association is often better understood than the confounder-exposure association. Epidemiologists typically focus on the full spectrum of potential causes of disease and less intensively on the ways in which exposures relate to one another. The strength of the confounder-disease association places an upper bound on the amount of confounding that could be present, which will reach that maximum value when the exposure and confounder are completely associated. That is, if we know that the risk ratio for the confounder and disease is 2.0, then the most distortion that the confounder could produce is a doubling of the risk. If we have no knowledge at all about the confounder-exposure association, we might infer that an observed risk ratio for exposure of 1.5 could be explained by confounding (i.e., the true risk ratio could be 1.0 with distortion due to confounding accounting for the observed increase), a risk ratio of 2.0 is unlikely to be fully explained (requiring a complete association between con-founder and exposure), and a risk ratio of 2.5 could not possibly be elevated solely due to confounding.

As reflected by its dependence on two underlying associations between the potential confounding variable and disease and between the potential confounding variable and exposure, the algebraic phenomenon of confounding is indirect relative to the exposure and disease of interest. In contrast to misclassification or selection bias, which directly distorts the exposure or disease indicators and their association by shifting the number of observations in the cells that define the measure of effect, confounding is a step removed from exposure and disease. In order for confounding to be substantial, both the underlying associations, not just one of them, must be rather strong. Such situations can and do arise, but given the paucity of strong known determinants for many diseases, illustrations of strong confounding that produces spurious risk ratios on the order of 2.0 or more are not common.

The amount of confounding is expressed in terms of its quantitative impact on the exposure-disease association of interest. This confounding can be in either direction, so it is most convenient to express it in terms of the extent to which it distorts the unconfounded measure of association, regardless of whether that unconfounded value is the null, positive, or negative. Note that the importance of confounding is strictly a function of how much distortion it introduces, with no relevance whatsoever to whether the magnitude of change in the confounded compared to the unconfounded measure is statistically significant. Similarly, there is no reason to subject the confounder-exposure or confounder-disease associations to statistical tests given that statistical testing does not help in any way to evaluate whether confounding could occur, whether it has occurred, or how much of it is likely to be present. The sole question is with the magnitude, not precision, of the underlying associations.

The more relevant parameter to quantify confounding is the magnitude of deviation between the measure of association between exposure and disease with confounding present versus the same measure of association with confounding removed. We often use the null value of the association as a convenient benchmark of interest but not the only one: Given an observed association of a specified magnitude in which confounding may be present, how plausible is it that the true (unconfounded) association is the null value? We might also ask: "Given an observed null measure of association in which confounding may be present, how likely is it that the unconfounded association takes on some other specific value?" Based on previous literature or clinical or public health importance, we might also ask: "How likely it is that the unconfounded association is as great as 2.0 or as small as 1.5?"

The amount of confounding can also be expressed in terms of the confounding risk ratio, which is the measure of distortion it introduces. This would be the risk ratio which, when multiplied by the true (unconfounded) risk ratio would yield the observed risk ratio, i.e., RR (confounding) X RR (true) = RR (observed). If the true risk ratio were the null value of 1.0, then the observed risk ratio would be solely an indication of confounding whether above the null or below the null value. A truly positive risk ratio could be brought down to the null value or beyond, and a truly inverse risk ratio (<1.0) could be spuriously elevated to or beyond the null value. Quantitative speculation about the mag nitude of confounding consists of generating estimates of RR (confounding) and the associated probabilities that those values occur.

## Quit Smoking Today

Quit smoking for good! Stop your bad habits for good, learn to cope with the addiction of cigarettes and how to curb cravings and begin a new life. You will never again have to leave a meeting and find a place outside to smoke, losing valuable time. This is the key to your freedom from addiction, take the first step!

## Post a comment