In the debate over President Obama’s pre-K proposal, one important issue is whether pre-K programs can work on a large-scale, not just in small “hothouse” programs run by researchers. A closely-related issue is whether pre-K works for middle-class children, or just for low-income children.
In this policy debate, one recent study that has received some attention is a short piece by Russ Whitehurst, director of the Brown Center on Education Policy at the Brookings Institution, and former director of the Institute for Education Sciences within the U.S. Department of Education. Dr. Whitehurst focuses on the evidence from research on state pre-K programs. He argues that this state-focused research only provides weak evidence for the efficacy of pre-K: he characterizes the studies as “thin empirical gruel”.
Dr. Whitehurst’s main argument is that the state pre-K programs have not been evaluated using random assignment experiments. He argues that there are various methodological problems with the current research studies on state pre-K program.
In an ideal world for researchers, we would want as much evidence as possible from studies with random assignment of children to either receive or be denied pre-K services. With such random assignment, if implemented perfectly, the children who receive pre-K services, versus children who do not, will on average be the same in both observed and unobserved characteristics. Any differences in outcomes of children in the pre-K vs. non-pre-K group will then be plausibly due to the pre-K program, not what types of families select pre-K services or how the pre-K program selects children. However, in the real world, it is not always possible to perfectly implement random assignment experiments for all programs in all places, and we need to look to other methodologies that will provide useful research evidence. For example, if not all participants in the random assignment experiment have useful data, it is possible that differential attrition in the two groups, children who did and didn’t receive pre-K services, may lead to differences in unobserved characteristics that can bias estimated program effects.
In my opinion, Dr. Whitehurst significantly understates the quantity and quality of the existing research evidence for the effectiveness of large-scale state and local pre-K programs. Although these research studies do not typically use random assignment, they often rely on “natural experiments”. In these “natural experiments”, the assignment of children to receive pre-K program services either varies with factors that are plausibly unrelated to unobserved characteristics of children, or we do a good job of controlling for children’s characteristics.
These “natural experiments” provide evidence that is almost as reliable as the evidence from real-world random assignment experiments, which have their own imperfections. A perfectly implemented random assignment experiment may be the “gold standard” for research credibility. But a well-done study of a natural experiment can provide “silver standard” evidence. When such “silver standard” evidence consistently points to the effectiveness of large-scale pre-K programs, the quantity and quality of such evidence provides a sufficient basis for policymakers to implement large-scale pre-K programs with reasonable confidence of success.
Dr. Whitehurst’s review of the large-scale pre-K studies overlooks many studies that provide relevant evidence. Most importantly, he omits evidence from the Chicago Child-Parent Center program. Studies of this program rely on the difference in access to the program’s pre-K services in similar low-income neighborhoods in Chicago – a “natural experiment”. A series of studies of this program have found evidence for effects of the Child-Parent Center Program in reducing crime, and in increasing educational attainment and earnings. Dr. Whitehurst also omits favorable evidence for the effectiveness of various large-scale state pre-K programs, for example in Arkansas, New Jersey, and Tennessee. The Arkansas and New Jersey studies try to control for observable characteristics in finding comparison groups. The Tennessee study used random assignment experiment, the “gold standard”, but with considerable problems in missing data for some study participants.
Dr. Whitehurst is also too quick to dismiss the many studies that use a technique called “regression discontinuity” to evaluate pre-K programs’ effects on kindergarten readiness. In these studies, the researcher uses information on two groups of students, one group that has just started the pre-K program and another group that has just started kindergarten after completing the pre-K program. The two groups take the same assessment tests. The research methodology is to examine the data to see if there is a large “jump” in test scores, above what would be predicted due to aging, between students who were a few days too young to enter pre-K the previous year, and are therefore just starting pre-K, and other students who just made the age cut-off for taking pre-K the previous year, and who therefore are starting kindergarten this year. The argument for this regression discontinuity methodology for studying pre-K is that these children, who are only apart by a little bit in age, should be similar in all observable and unobservable characteristics. As a result, any large jump in test scores at the age cutoff can be plausibly attributed to the pre-K program.
These “regression discontinuity” studies of state pre-K programs often find sizable effects of large-scale state pre-K programs in improving kindergarten readiness. These studies also find that pre-K has large benefits for middle-class children, not just low-income children.
Dr. Whitehurst’s argument against this regression discontinuity approach is that the test score differences could be due not to the pre-K program, but to differences in how parents treat children who are only a few days apart in age. The argument is that parents who know that their son or daughter will go to kindergarten next year,, compared to parents whose son or daughter is a few days younger, and therefore will not be going to kindergarten next year, will either work harder to make their child ready for kindergarten, or will do more to expose their child to older playmates.
Dr. Whitehurst’s argument is theoretically possible. But is it really empirically plausible that such large test score effects will occur due to such possible differential parent behavior for children only a few days apart in age? I doubt that any such differences are even close to sufficient to account for the large test score effects we see.
Furthermore, the regression discontinuity studies consistently find test score effects on math test scores, not just vocabulary and literacy test scores. Math achievement is more related than literacy achievement and especially vocabulary achievement to school factors compared to home factors. If Whitehurst’s hypothesis was correct, we might expect the regression discontinuity methodology to find significant effects on vocabulary but not on math. This is not what we find.
Finally, several of the regression discontinuity studies complement their regression discontinuity results with evidence using good comparison groups. The New Jersey , Arkansas, and Oklahoma studies use observable characteristics in trying to compare pre-K participants with non-participants. The Tennessee study uses random assignment to compare pre-K participants with non-participants. These studies find that both regression discontinuity studies and other types of methodologies find positive effects of state pre-K programs on kindergarten readiness.
In sum, we have a large number of studies that point to the effectiveness of large-scale pre-K programs in improving outcomes for children. Although much of the evidence is not from random assignment experimentation, it is from studies with reasonable methodologies. The evidence is consistent with smaller-scale studies that do use random assignment. Given the difficulties in implementing random assignment on a large scale, and the large amount of money and time it would take to do so, it is unreasonable to demand that every aspect of pre-K policy be backed by a random assignment experiment.
Children are only 4 once. If we delay policy innovation to wait around for more and more long-term random assignment studies to be done, then there is a potential tremendous opportunity cost in not providing pre-K services to many cohorts of children.