Jason Richwine, in a recent blog post at “The Corner” blog of National Review, expressed surprise at my interpretation of the estimated effects in the Head Start randomized control trial.
I had pointed out that the impact estimates, while not statistically significantly different from zero, are also not statistically significantly different from predicting a 2 to 3% increase in adult earnings, which would probably be sufficient for Head Start to pass a benefit-cost test from earnings effects alone.
Richwine argues that the estimates and their confidence intervals also can’t rule out that Head Start has negative effects. He interprets my comments as arguing that the Head Start impact estimates are “large”. He concludes by arguing the following:
“Such analysis reverses the traditional burden of proof: Rather than showing that government preschool works, advocates now demand proof that it doesn’t work.”
These comments raise some interesting issues about how policymakers should make policy when given research that inevitably has some uncertainty about its estimates.
In making policy decisions, concepts such as the “burden of proof” are more confusing than helpful. The “burden of proof” is a legal concept used in court cases. In making policy, what we have are estimates with some uncertainty, and we have to decide what policy rules are likely over the long-haul to maximize net social benefits.
If the only evidence on public pre-K was the Head Start experiment, policymakers would face a difficult policy decision with considerably uncertain evidence. The point estimates of test score effects at the end of 3rd grade suggest a little more than a 1% increase in adult earnings. This is a modest-sized effect, in my opinion, not a “large effect”, although what is “large” or “modest” is a highly subjective judgment, not a rigorous scientific judgment. But because adult earnings are so large over an entire career, it would sum to many thousands of dollars. The present value of this earnings gain would probably exceed $5,000. Head Start costs more than that, but then Head Start also clearly has benefits in the value of the child care services it provides to parents. So the point estimate implies a close call on net benefits.
Furthermore, there is significant uncertainty in these estimates. The confidence interval includes zero and negative effects, as well as positive effects two or three times as large. How should policymakers deal with such uncertainty?
One approach is to take a skeptical attitude, and assume effects are zero until proven otherwise. But this skeptical approach would not be a particularly good policy rule to adopt if one were faced with many policy decisions over a long period of time. If a policymaker were simply trying to maximize the expected present value of net benefits over thousands of policy decisions, each with evidence from only one experiment, then the optimal decision rule would be to use each experiment’s point estimate to guide decisions, regardless of the confidence intervals. If we use the point estimates, which represent the mean expected impact of each intervention, then over time we will maximize net social benefits by following this rule.
In other words, the legal “burden of proof” principle is not a particularly good guide to making policy decisions over time. The legal rule that we should convict someone of a crime only if they are guilty “beyond a reasonable doubt” is ultimately based on the judgment that we find it socially abhorrent to deprive someone of their life or liberty based on any lesser standard. The huge social cost of convicting an innocent is not really relevant to deciding whether to spend a little more or less on some social or educational program. The costs of mistakenly expanding a social or educational program are not as great as the cost of locking someone up because the probability is 51% that they are guilty.
Another important point is that the Head Start experiment is NOT the only good evidence on the effects of pre-K. We have good evidence from two randomized experiments, Perry and Abecedarian, that pre-K can have large long-run effects. For example, long-run earnings effects are 19% in Perry. We also have good evidence from some natural experiments of long-run earnings effects, for example 8% in the Chicago Child-Parent Center study and 11% in Deming’s study of Head Start. Finally, we have some good natural experiments, for example in Tulsa and Boston, that show short-run test score effects of pre-K that are larger than found in the Head Start experiment.
In social science, or for that matter natural science, how we interpret any new experiment is influenced by what we already know. If we have substantial reasons from prior research to believe that variable X affects outcome Y, then in considering new evidence, our prior belief is not that X has no effect on Y. In interpreting the new research, we would ask whether the estimated effects in the new research are consistent not only with a null hypothesis of zero effects, but also with a null hypothesis of the estimated effects implied by prior research. Both of these null hypotheses are interesting to explore. If the new research shows lower effects of X on Y than implied by prior research, this should influence us going forward towards believing that X has lower effects.
In the case of the Head Start experiment, the modest effects found should influence researchers towards believing that at least some pre-K programs have considerably smaller effects than found by Perry or the Chicago Child-Parent Center study or the Tulsa or Boston studies. It should also influence us towards wondering whether Head Start as of the 2002 experiment might have lower effects than it did in the past. And it might influence us towards desiring to reform Head Start to increase its effectiveness, in part by imitating the practices of pre-K programs that have larger estimated effects. As Barnett has pointed out, there is some evidence that Head Start has increased its educational effectiveness since the time of the 2002 experiment.
The Head Start experiment by itself is not strong evidence in favor of public pre-K. But it is not the only evidence, and it is not necessarily inconsistent with this other evidence. On the whole, the weight of the evidence, as suggested by a number of reviews of the research, is that high-quality pre-K programs can make a significant difference in improving the opportunities of children. The estimated benefits in the bulk of the research are sufficient to be significantly greater than program costs.