In addition to the special substantive issues that apply to disease misclassifica-tion and its consequences, there are a number of methodological issues that need to be appreciated to assess the consequences of different patterns of misclassifi-cation. A key distinction is between subtypes of disease misclassification that are invariant with respect to exposure (non-differential misclassification of disease)
versus those that differ as a function of exposure status (differential misclassification of disease).
The result of nondifferential disease misclassification depends on the type of design (cohort versus case-control), whether the error is due to disease under-ascertainment (false negatives) or overascertainment (false positives), and the measure of association (ratio or difference). The general principle is that nondifferential misclassification in a dichotomous variable tends to produce bias towards the null (Rothman, 1986). Whatever the value would have been without misclassification, whether above or below the null value, nondifferential mis-classification in a dichotomous variable will most often bias the effect estimate by moving it closer to the null value (Kleinbaum et al., 1982). If the association is truly inverse, then the bias will be upward toward the null, and if the association is positive, then the bias will be downward toward the null. While this rule applies to many of the circumstances in which disease misclassification occurs, there are also some important exceptions to the rule in which no bias is expected to occur on average (Poole, 1985). Situations in which bias is absent should be identified and even sought out if the investigator or evaluator of the data has such an opportunity.
The consequences of erroneous assessment of disease depend on the study design. In a case-control study, the process by which potential cases are identified needs to be examined (Brenner & Savitz, 1990). Underascertainment of disease, if non-differential with respect to exposure, is tantamount to randomly sampling cases. In other words, a disease assessment mechanism that has a sensitivity of 80% is functionally equivalent to having decided to randomly sample 80% of eligible cases. The exposure prevalence among cases should not be altered due to underascertainment, though precision will be reduced due to the unnecessary loss of otherwise eligible cases. On the other hand, if undiagnosed cases remain under consideration as eligible potential controls past the time of disease onset, they will introduce selection bias since they have the exposure prevalence expected of cases and should have been removed from the study base once their disease began. Only under the null hypothesis, when exposure prevalence is identical among cases and the study base, will no bias result. Alternatively, if those cases who were erroneously not identified (and thus excluded) can be identified and omitted from the study base from which controls are sampled, then this bias can be averted. Inclusion of cases in the study base from which controls are to be sampled after their disease has begun will yield a biased sample. For reasonably rare diseases, however, the proportion of false negative cases among the pool of controls should have a negligible quantitative impact on the results.
In contrast, in a case-control study, disease overascertainment (imperfect specificity) will mix true cases with a potentially sizable number of misdiagnosed (false positive) cases, particularly if the disease is reasonably rare. Thus, the identified case group will have a blend of the exposure prevalence among true cases and the exposure prevalence among erroneously diagnosed false positive cases. This mixing will yield a bias towards the null, giving the observed case group an exposure prevalence between that of true cases and that of a random sample of the study base, represented by the false positives. Only when there is no association between exposure and disease, whereby cases would have the same exposure prevalence as the study base, will no bias result. Given the risk of overwhelming true cases with false positives when disease is rare, it is important in case-control studies to seek the maximum level of specificity even at the expense of some loss in sensitivity (Brenner & Savitz, 1990). Therefore, comparing results for varying levels of disease sensitivity and specificity (see Section below, "Examine results across levels of diagnostic certainty") suggests that the most valid estimates will be obtained for the most restrictive, stringent disease definitions. Given that only ratio measures of effect (odds ratios) can be assessed in case-control studies, all of the comments about bias due to nondifferential misclassification refer to the odds ratio.
In contrast, nondifferential underascertainment of disease in cohort studies does not produce a bias in ratio measures of effect (risk ratios, odds ratios) (Poole, 1985; Rothman & Greenland, 1998). Assume that the disease identification mechanism, applied identically among exposed and unexposed subjects, successfully identifies 80% of the cases that are truly present. The absolute rate of disease will be 0.80 times its true value in both the exposed and unexposed groups. For ratio measures of effect, the sampling fractions cancel out, such that there is no bias—0.80 times the disease rate among exposed subjects divided by 0.80 times the disease rate among unexposed subjects produces an unbiased estimate of the risk ratio. Note the minimal assumptions required for this to be true: only disease underascertainment is present and it is identical in magnitude for exposed and unexposed subjects. If these constraints can be met, either in the study design or by stratification in the analysis, then unbiased measures of relative risk can be generated. In this situation, however, the measure of rate difference will be biased, proportionately smaller by the amount of underascertainment. For a given sampling fraction, for example, 0.80, the rate difference will be 0.80 times its true value: 0.80 times the rate in the exposed minus 0.80 times the rate in the unexposed equals 0.80 times the true difference.
For non-differential disease overascertainment, the consequences are the opposite with respect to ratio and difference measures, i.e., bias in ratio measures but not in difference measures of effect. In contrast to underascertainment, in which a constant fraction of the true cases are assumed to be missed, overascer-tainment is not proportionate to the number of true cases but instead to the size of the study base or denominator. That is, the observed disease incidence is the sum of the true disease incidence and the incidence of overascertainment, with the total number of false positive cases a function of the frequency of over-ascertainment and the size of the study base. If the disease incidence due to overascertainment is identical for exposed and unexposed subjects, the effect is an addition of the same constant to the true incidence in both groups. For ratio measures of effect (rate ratios, odds ratios), the addition of a constant to the numerator and denominator will yield an estimate that is biased towards the null. On the other hand, for measures of effect based on differences (risk or rate difference), the extra incidence due to false positive diagnoses will cancel out. Assume that the true incidence of disease is 10 per 1000 per year among the exposed and 5 per 1000 per year among the unexposed for a rate ratio of 2.0 (10/1000 / 5/1000) and a rate difference of 5 per 1000 per year (10/1000 -5/1000). If the overascertainment due to false positive diagnoses were 2 per 1000 per year among both the exposed and unexposed, the rate ratio would be biased toward the null as follows: Among the exposed, the observed incidence rate would be 12 per 1000 (10/1000 true positives plus 2/1000 false positives) and among the unexposed, the observed incidence rate would be 7 per 1000 (5/1000 true positives plus 2/1000 false positives) for a rate ratio of 1.7 (12/1000 / 7/1000), biased toward the null. In general, the ratio of X plus a constant divided by Y plus a constant is closer to 1.0 than X divided by Y (bias in ratio measures toward the null), so that the overascertainment always yields a bias toward the null. The rate difference however, would not be affected: the observed rate difference of 12/1000-7/1000 = 5/1000 is the same as the true rate difference, 10/ 1000-5/1000 = 5/1000. In general, X plus a constant minus Y plus a constant is the same as the difference between X and Y (no bias in rate difference). If we are aware that non-differential disease overascertainment is present, then difference measures would have an advantage over ratio measures in avoiding bias.
When disease overascertainment or underascertainment differs according to exposure status (differential misclassification), the direction and magnitude of bias can still be predicted based on the direction and magnitude of error by determining which groups will be spuriously large and which groups will be spuriously small. If disease ascertainment is less complete among the unexposed, for example, then a bias towards a falsely elevated measure of association results. If disease overascertainment occurs preferentially among the unexposed, then the measure of effect will be biased downward. Any errors that inflate the rate of disease among the exposed or reduce the rate of disease among the unexposed will bias the measure of effect upwards, and errors that reduce the rate of disease among the exposed or inflate the rate of disease among the unexposed will bias the measure of effect downwards. Note that the null value does not provide a meaningful benchmark in assessing the effects of differential misclassification. The predicted direction of bias cannot be generalized as moving toward the null or away from the null given that the movement of the effect estimate due to misclassification is defined solely by its absolute direction, up or down. The true measure of effect, the one that would be obtained in the absence of disease misclassification, is artificially increased or decreased, and may cross the null value.
With this background on the consequences of disease misclassification, the challenge is to make practical use of the principles to assess the potential for bias and to develop methods for minimizing or eliminating bias. First, we have to determine the situation that is operating in a specific study to know which principle to invoke. Identification of the study design should be straightforward, defined solely by whether sampling is outcome-dependent (case-control design) or not (cohort design) (Morgenstern & Thomas, 1993). Determining whether disease underascertainment, overascertainment, or both are present is not so easily achieved, requiring careful scrutiny of methods and results. The following discussion provides some strategies for evaluating the type and amount of disease misclassification, as well as methods for seeking to ensure that a given study has a known type of error that can be more readily managed. If the form of disease misclassification can be defined or constrained to one type, the impact on the results is at least predictable if not correctable. When both false positive and false negative errors are present in a dichotomous outcome and those errors are nondifferential with respect to exposure, regardless of the design or measure of effect, bias toward the null will result.
Was this article helpful?
All you need is a proper diet of fresh fruits and vegetables and get plenty of exercise and you'll be fine. Ever heard those words from your doctor? If that's all heshe recommends then you're missing out an important ingredient for health that he's not telling you. Fact is that you can adhere to the strictest diet, watch everything you eat and get the exercise of amarathon runner and still come down with diabetic complications. Diet, exercise and standard drug treatments simply aren't enough to help keep your diabetes under control.