An ongoing policy dispute is about how effective Head Start is as a preschool program. Head Start has other goals, for example improving public health. However, an important issue is how Head Start’s effects on kindergarten readiness, K-12 test scores, and long-run educational attainment and earnings compare with the effects of other pre-K programs, or with no pre-K program at all.
I’ve written about Head Start’s effectiveness in previous blog posts. However, I wanted to provide some reflections based on some of the latest research, including the recently released 3rd grade results for the random assignment Head Start study, and the recent meta-analysis by Shager et al. of Head Start’s immediate cognitive effects. It seems to me that it is useful to compare these results with Deming’s study from a few years ago that examined the time pattern of Head Start’s results by comparing siblings who differed in Head Start participation.
What do we know about Head Start’s effects?
(1) Head Start significantly improves cognitive test scores at the end of Head Start, or at kindergarten entrance, compared to no preschool at all. The estimated effect size is on the order of 0.31 (see Shager et al.). This “effect size” metric is common educational statistics jargon. It compares the effects of Head Start with a measure of variation in test scores (the test score standard deviation) across students at the end of Head Start or the beginning of kindergarten. This effect size means that Head Start, compared to no preschool, increases students’ cognitive test scores by about 40% greater than what they would have gained during the preschool year without Head Start or other preschool. (This calculation is based upon comparing students’ average learning with age independent of preschool attendance, as calculated in Bartik, Gormley, and Adelstein).
(2) Head Start’s immediate effects are significantly less when compared with a control group for whom a sizable percentage attends other preschools. Shager et al. find that Head Start’s immediate average effects might be reduced by two thirds when a high percentage of the control group is active in other preschool programs. This might occur because at least some of these other preschool programs have larger immediate effects than Head Start on cognitive test scores. For example, Gormley et al. find that the Tulsa state-funded “universal” pre-K program, compared to Tulsa’s Head Start programs, has about twice the immediate effects on cognitive test scores at kindergarten entrance.
(3) Head Start’s immediate cognitive test scores decay quite a bit over time when measured in effect size units. However, studies differ somewhat on when this decay occurs, and its magnitude. This pattern may reflect the timing of the study and what comparison is being made.
For example, the recent Head Start experiment examined Head Start’s effects for children who participated in Head Start in 2002-2003. These effects of Head Start were also relative to a control group that participated in other preschools, which may reduce both immediate effects and later effects. This study found that the immediate effect, at the end of preschool, of age 4 Head Start on three cognitive tests had an average effect size of 0.22. (This averages the “Impact on the Treated” effect sizes for the following 3 tests, which are consistently administered over time: the PPVT test, the WJ-III Letter-Word ID test, and the WJ-III Applied Problems test.) The end of third grade test scores effects on these same three tests had an average effect size of 0.06, over 70% lower, and none of these three test score effects was statistically significant at the end of third grade.
Deming’s study of Head Start compared siblings who differentially participated in Head Start vs. no preschool. The rationale for this methodology is that although this Head Start participation is not “randomly” assigned, comparing siblings should hold constant many unobserved family factors. This study is examining the effects of Head Start participation that mostly occurred sometime during the 1980s. Deming finds that Head Start has initial effects of about a 0.15 effect size, compared to no preschool, in a sample of kids who mostly participated in Head Start in the 1980s. Unlike the recent Head Start experiment, these test score effects were mostly maintained when the former Head Start children were ages 7-10, declining to an effect size of 0.13. But the test score effects deteriorated at ages 11-14, to an effect size of only 0.06, and this small effect is statistically insignificant.
Why the difference in this time pattern of decay? That is hard to say. It could be due to the different methodologies, random assignment experiment versus comparing siblings with different participation in Head Start. However, Shager et al. do not find that the immediate effects of Head Start are lower in experimental studies vs. “quasi-experimental” studies such as Deming’s. And Deming’s study actually finds lower immediate effects of Head Start than the recent random assignment experiment.
The differences could be due to the time period of the study, and the comparison group. Effects of Head Start may deteriorate more quickly in a later period when compared to a group for whom other preschool options have improved over time.
(4) Even after Head Start’s test score effects have significantly decreased, Deming’s study suggests that long-run effects of Head Start may be large. By large, I mean that these effects suggest that Head Start easily passes a benefit-cost test.
For example, Deming finds that Head Start is estimated to have an effect size of 0.23 on an index of various age 19 or above measures of educational attainment, crime involvement, employment status, unwed teen parenthood, and health. He predicts that this increase in his adult outcomes index would be expected to increase future wages by about 11%. This large predicted future effect on wages occurs even though the estimated effect of Head Start on test scores at ages 11-14 is small and statistically insignificant. The real rate of return to Head Start is calculated to be 7.9%, which compares very favorably to many social and private investments.
Would this “bounceback” of long-term effects also occur for participants in the current Head Start random assignment experiment? That’s hard to say, in part because the current experiment shows more immediate deterioration of Head Start’s test score effects, and because the current experiment is comparing Head Start to a control group that participated in other preschools.
We do know that we don’t need to have long-term effects of this magnitude for Head Start to pass a benefit-cost test. Deming calculates that at a 3% real discount rate, which is often used in social benefit-cost analyses, Head Start would pass a benefit-cost test even if adult effects were 70% lower than what he estimates. Boosting adult earnings by even a few percentage points has large social benefits.
The bounceback of long-run effects, after early effects on test scores have significantly decreased, is not unusual in early childhood programs. For example, contrary to recent claims by Whitehurst, the Perry Preschool program shows this pattern, although with a somewhat slower deterioration in test score effects. In the Perry study, performance on various standardized tests had declined to very low effect sizes by age 8. For example, the Perry treatment group, compared to the control group, showed an effect size at age 8 of only 0.06 on the PPVT test, after having an effect size of 0.91 after two years of preschool. (See Table 3.3 of the Age 40 follow-up to Perry.) (School achievement tests showed greater long-run effects in the Perry study, but these school-based achievement tests are not the sort of measure used in the Head Start experiment, probably because they are difficult to compile and compare across diverse schools.) Despite this deterioration in estimated standardized test score effects of the Perry program, the Perry study showed strong long-run benefits for adult outcomes.
As another example, Chetty et al.’s recent analysis of the long-run effects of kindergarten class-quality show significant deterioration of test score effects, followed by a bounceback in effects on adult earnings. Based on their research, the test score effects at 3rd grade of kindergarten class quality are around one-sixth the effects at the end of kindergarten. (See their Figure VI). But kindergarten class quality ends up having large effects on adult earnings. The effects of kindergarten class quality on adult earnings can be more accurately predicted by kindergarten class quality’s effects on end-of-kindergarten test scores than by kindergarten class quality’s effects on test scores in subsequent grades.
This bounceback may be due to harder-to-measure effects on soft skills, as hypothesized by Heckman. There is some support for this interpretation in Deming’s results. Deming finds sizable effects of Head Start on reducing grade retention and learning disability diagnoses, which might help explain Head Start’s long-term benefits.
What should be made of all this? First, the Head Start estimates don’t indicate that “preschool doesn’t work”. They might indicate that the average Head Start center’s performance may sometimes be disappointing when compared with some of the best state pre-K programs.
Second, the estimates do suggest that Head Start needs to improve its average performance if the goal is to rank with the best public preschools in effects on cognitive outcomes.
Third, the results suggest caution in generalizing from medium-run effects of Head Start to effects on adult outcomes for former participants. Effects that disappear at one age can reappear at other ages. Sometimes the short-run effects of early childhood interventions are the best predictors of long-run effects.