This paper by Andrews, Jargowsky and Kuhn (henceforth dubbed here the AJK paper) finds statistically significant effects of Texas’s large-scale but not universal pre-K program on 3rd grade test scores. AJK find that the program increases 3rd grade test scores by what education researchers call an “effect size” of around 0.05, although the precise numbers vary with the specification. These effects are in most cases highly statistically significant.
The estimated effects are not based on a random assignment experiment. However, they do have some reasonable control groups. In some specifications, AJK are comparing similarly disadvantaged children who did not participate in the program, to children in the same school district who did participate in Texas pre-K. In other specifications, they are comparing children in the same school districts before and after the school district decided to participate in the state program. So the estimates are not experimental, but they rely on good “quasi-experimental” evidence in that the estimates control for many possible differences between the treatment and comparison groups.
What do the estimates mean for research on pre-K programs? First, as the authors highlight, they mean that even programs that are rated as lower-quality pre-K programs can have statistically significant effects on later academic outcomes. As AJK point out, Texas’s pre-K program is not rated highly by the National Institute for Early Education Research. NIEER only rates Texas as meeting 4 of 10 benchmark requirements. For example, according to NIEER, Texas has no required maximum class sizes or child-staff ratios, and no state monitoring of quality. On the other hand, the Texas program does have early learning standards and requires that the lead teacher have a BA along with some specialized training in pre-K.
Second, it is important to note that only the large sample size of this study allowed effects of this magnitude to be detected as statistically significant. Some of the estimates use samples of over 500,000 students, which allows estimates to be quite precise. Most studies of pre-K have much smaller sample sizes – Perry pre-K only had 123 participants, the Chicago Child Parent Center study had around 1400 participants, my recent study with Gormley and Adelstein of Tulsa’s pre-K program had around 2600 participants, and the recent national Head Start experiment had almost 5000 participants. A standard statistical analysis suggests that to have adequate “power” to detect the effects found in Texas would require a sample size of a little over 12,000 children participating in the study. (That is, to have a power of 0.80, or a probability of 80% to detect a statistically significant effect at the 95% level of significance, when the true effect size is 0.05, would require a sample size of over 12,000.)
Thus, the implication is that for at least some pre-K programs, with modest effects, existing studies are often “under-powered” , that is have too small a sample size to detect plausible effects of the program.
Third, the Texas pre-K program has considerably smaller effects than is found in high-quality studies of higher-quality pre-K programs. For example, Steve Barnett concludes from his review of the research that an average effect size that we might expect in elementary school from pre-K programs would be around 0.30. This is six times the effect found in Texas.
Therefore, it is important to emphasize that good studies have found that pre-K can achieve better than the results found in Texas.
Fourth, even the modest effects found in Texas might well lead to long-term benefits that would exceed the costs of the program. Based on the 3rd grade test score estimated test score effects in Texas, I estimate that Texas’s program would be expected to increase the present value of future earnings for the average child in pre-K by around $3,500. The program’s costs are around $3,800 per child. So, based solely on 3rd grade test score effects alone, the program’s benefits and costs are about the same.
(These estimates use a similar methodology to what the Bartik/Gormley/Adelstein paper did in forecasting the future earnings effects of Tulsa pre-K. I used estimates from Chetty et al on how 3rd grade test scores affect adult earnings.)
However, the earnings benefits predicted by 3rd grade test scores overlook possible additional benefits that would be important in a benefit-cost analysis. One benefit overlooked is possible crime reduction benefits. In many benefit-cost studies, as summarized in my paper with Gormley and Adelstein, adult earnings benefits are half or less of the total social benefits of pre-K, with reduced crime making up most of the remaining benefits.
In addition, earnings benefits predicted by 3rd grade test scores may understate future earnings effects of pre-K. In many studies of early childhood interventions, the initial effects of the program on test scores tend to fade over time. For example, this pattern is found in Deming’s study of Head Start, and Chetty et al’s study of the effects of kindergarten class quality. But these studies find that adult outcomes tend to increase to close to the level predicted by the initial test score effects. In Chetty et al’s study, for example, the end of kindergarten test score effects of kindergarten class quality do a good job of predicting adult earnings effects of kindergarten class quality, whereas the 3rd grade test score effects of kindergarten class quality would predict adult earnings effects of only one-third of what we actually observe.
What might cause this fading and recovery pattern? The most plausible explanation is that early childhood programs have difficult-to-measure effects on “soft skills/social skills” that persist and significantly increase adult earnings effects.
In sum, I think the AJK study of Texas adds to the evidence that large-scale, modest quality pre-K programs can work. However, the available research evidence suggests that higher-quality pre-K programs can accomplish a lot more. And the effects of higher-quality pre-K programs will be much easier to demonstrate to skeptics with the more typical small sample sizes available to most researchers.