There are basically two ways to generate a series of consistent findings: they may be consistently right or consistently wrong. When an array of studies generates consistent findings, a reasonable inference might be that despite an array of potential biases in the individual studies, the problems are not so severe as to prevent the data from pointing in the direction of the truth. Hypothesized biases within an individual study cannot be confirmed or refuted, but it may be possible to define a gradation of susceptibility to such biases across a series of studies. If a series of studies with differing strengths and limitations, and thus varying vulnerability to bias, all generate broadly comparable measures of association, one might infer that the studies are all of sufficient quality to have accurately approximated the association of interest.

Unfortunately, it is also possible for a series of studies to generate consistently incorrect findings. There are often similarities across studies in the design or methods of conduct that could yield similarly erroneous results. For example, in studies of a stigmatized behavior, such as cocaine use, in relation to pregnancy outcome, there may be such severe underreporting as to yield null results across a series of studies. On the other hand, cocaine use is strongly associated with other adverse behaviors and circumstances that could confound the results, including tobacco and alcohol use and sexually transmitted infection. These tendencies may well hold across a wide range of populations. Thus, the observation of a consistent association with adverse pregnancy outcome (Holzman & Paneth, 1994) may well be a result of consistent confounding. Perhaps the key difference between asking whether a single study has yielded an erroneous result and whether a series of studies has consistently done so is that in the latter case, the search is for attributes common to the studies.

The credibility assigned to consistent results, often implicit rather than explicit, is that the studies compensate for one another's weaknesses. The possibility of substantial bias resulting from a methodologic flaw in a single study is countered by evidence from other studies that do not suffer from this weakness yet show the same result. This manner in which studies can compensate for one another's weaknesses is central to the interpretation of a series of studies, and therefore warrants closer examination.

