Researchers have long used federal court data assembled by the Administrative Office of the U.S. Courts (AO) and the Federal Judicial Center (FJC). The data include information about every case filed in federal district court and every appeal filed in the twelve non-specialized federal appellate courts. The varied uses of the AO database have led to its being called "by far the most prominent" database used by legal researchers for statistical analysis of case outcomes. Like many large data sets, the AO data are not completely accurate. Some reports exist relating to the AO data's reliability, but no systematic study of the AO's non-bankruptcy data has been published. In the course of a substantive study of federal litigation brought by inmates, one of us began to investigate the nature and rate of errors, exploiting a technological innovation in federal court records: the availability of docket sheets over the Internet via the federal judiciary's Public Access to Court Electronic Records project (PACER). This Article follows a similar method to begin more comprehensively the process of assessing the AO data's reliability.
Our study looks at two large categories of cases, torts and inmate civil rights, and separates two aspects of case outcomes: which party obtained judgment and the amount of the judgment when plaintiffs prevail. With respect to the coding for the party obtaining judgment, we find that the AO data are very accurate when they report a judgment for plaintiff or defendant, except in cases in which judgment is reported for plaintiff but damages are reported as zero. As to this anomalous category (which is far more significant in the inmate sample than in the torts sample), defendants are frequently the actual victors in the inmate cases. In addition, when the data report a judgment for "both" parties (a characterization that is ambiguous even as a matter of theory), the actual victor is nearly always the plaintiff. Because such cases are quite infrequent, this conclusion is premised on relatively few observations and merits further testing.
With respect to award amounts, we find that the unmodified AO data are more error prone, but that the data remain usable for many research purposes. While they systematically overestimate the mean award, the data apparently yield a more accurate estimate as to median awards. Researchers and policymakers interested in more precise estimates of mean and median awards have two reasonably efficient options available. First, as described below, they can exclude two easily-identified classes of awards with evidently suspect values entered in the AO data. Second, using PACER or courthouse records, they can ascertain the true award only in the suspect cases without having to research the mass of cases. Either technique provides reasonable estimates of the median award. The second technique may provide a reasonable estimate of the mean award, at least for some case categories.