Russ Whitehurst has some more recent comments on pre-K, this time arguing against a more recent study of Georgia pre-K. This more recent study found pre-K effects on cognitive skills which, averaged across all tests used, had an average “effect size” of 0.69. This is quite high.
(“Effect size” is education research jargon for scaling the effects of some policy on test scores by dividing the effect by the “standard deviation” of the test score across students. This is an attempt to control for the arbitrariness of test score metrics by measuring the test score effect relative to how much this particular test score seems to vary in the sample.)
Whitehurst mainly argues against this study’s validity for two reasons, one of which is a weak argument, and the other of which is a stronger argument. First, he argues that that there’s a problem in all regression discontinuity studies because some pre-K graduates inevitably disappear from the sample when they’re followed up on at the beginning of kindergarten. Although this sample attrition could cause bias in program estimates, a bias which could go in either direction, in practice careful studies find that this bias is small. For example, the Boston regression discontinuity study did numerous tests for possible biases and found no sign of them. The Kalamazoo study did some estimates that restricted the sample to only the same children observed prior to pre-K and after pre-K, and found no significant difference in the estimates.
A second and more valid concern is that the Georgia study has much larger sample attrition due to problems in obtaining consent from the families and schools of pre-K graduates entering kindergarten. Furthermore, there are some signs that this differential sample attrition led to the entering kindergarten sample being somewhat more advantaged. This differential in family consent rates could have led to more advantaged children being over-represented in program graduates, which might bias the study towards over-estimating program effects. I’m sure these issues will be discussed as this report is submitted to academic journals, and is evaluated and re-estimated during the academic refereeing process.
Whitehurst also expresses some doubt about the large size of the estimated effects. The effects are large, although Whitehurst exaggerates the differentials from other research. The average effect size from previous studies in a meta-analysis by Duncan and Magnuson is 0.35, and in a meta-analysis by the Washington State Institute for Public Policy is 0.31. These average effect sizes tend to be lower for more recent studies, and for Head Start than for state and local pre-K programs.
The regression discontinuity studies tend to get a bit higher effect sizes. For example, average effect sizes for the regression discontinuity study of Boston pre-K was 0.54.
But, as I have discussed previously, and as Whitehurst has alluded to previously, regression discontinuity studies of pre-K are estimating something a little bit different than other pre-K impact studies. Regression discontinuity studies are studying effects of pre-K for program graduates relative to what would have occurred if they had just missed the age cut-off for pre-K entrance and had not attended this subsidized pre-K until a year later. This means that regression discontinuity pre-K studies are in many cases comparing pre-K with no pre-K, as parents are less likely to enroll children in pre-K if they will not be attending kindergarten the next year. In contrast, other pre-K impact studies are measuring the effects of some public pre-K program relative to a comparison group which will be attending kindergarten the next year, and therefore the comparison group is more likely to attend pre-K. The fact that the comparison group is more likely to attend pre-K probably reduces the net impact estimates for these other pre-K studies.
Which type of estimate is more useful? I think they’re both useful. The regression discontinuity results tell us something about the effects of pre-K versus no pre-K. This is useful for comparison with the gross costs of pre-K. The RD estimates are closer to what a labor economist would call “structural estimates” of the effects of pre-K, which can be useful for modeling the effects of other pre-K programs.
On the other hand, other pre-K estimates tell you the effects of this particular pre-K program versus whatever other pre-K programs are currently available in that particular pre-K marketplace. This is useful if the only policy we are considering is whether or not to adopt this particular pre-K program in this particular pre-K market. In that case, a benefit cost analysis would have to compare the net benefits of this program versus the extra net social costs of substituting this new program for existing programs. In other words, the new program’s costs may be reduced considerably because it may save in costs on existing pre-K programs, which means it doesn’t take as big an effect size for the program to pass a benefit-cost test.
For both of these types of estimates, extrapolating the estimates to some other pre-K program in some other state or local area requires some assumptions. In general, introducing a new high-quality free pre-K program in any particular local area will result in some increases in pre-K enrollment in this program, and some reductions in enrollment in other programs, with the exact pattern depending on the program being introduced and what is currently available in that market. Neither the RD estimates, nor the estimated effects of some other pre-K program in some other market, will tell you the net benefits of a new pre-K program in a new market without further assumptions about program take-up of the new program versus the old programs, and without some assumptions about the relative quality of the new program versus the old programs.
In sum, I think the Georgia estimates are only suggestive, because of the problem of differential attrition in the treatment and control groups due to survey non-consent. The estimates may be correct, but this would require further analyses to demonstrate that the survey non-consent problem does not significantly bias the estimates. Because of this problem with survey non-consent, I would currently give this study a grade of “internal validity” (or “research reliability”) of C, although this grade might be moved up by further estimates by the authors to examine this issue.
However, the Georgia estimates are not representative of most of the regression discontinuity studies, which have done further analyses which suggest that the estimates are not biased by problems with attrition.
Whitehurst also updates his analysis of research to downgrade slightly his grade of “internal validity” (intuitively, research reliability) of the recent Tennessee study, which found quick fade-out of pre-K test score effects in Tennessee, to A- from A. But he does not note the factors that lead me to give the Tennessee study a grade for “internal validity” of C: specifically, there was differential attrition due to problems of family consent in the control group in this study, and the few estimates that did not suffer from this attrition bias suggest that the Tennessee program may have had greater effects than are found in the main estimates.
In other words, the Tennessee study actually has stronger evidence of biased estimates than is true of this recent Georgia study. However, for the Tennessee study, the bias appears to be leading the pre-K effects to be under-estimated. There certainly is no good reason to give the Tennessee study a higher grade for research reliability than the Georgia study.