What does the Fitzpatrick study of Georgia’s “universal” pre-K program show?

During the current debate over expanding pre-K, expansion opponents have sometimes cited a study by Maria Donovan Fitzpatrick of Georgia’s pre-K program.

For example, on February 25, 2013, the Wall Street Journal editorial page described the conclusions of Fitzpatrick’s study as follows:

“Careful work by Maria Donovan Fitzpatrick of the Stanford Institute for Economic Policy Research looked at student achievement in Georgia as its pre-K program phased in. While she found some modest gains, she also concluded that the costs outweighed the benefits by a ratio of six to one. Nearly 80% of enrollment is “just a transfer of income from the government to families of four year olds” who would have attended preschool anyway.”

I’ve re-read Professor Fitzpatrick’s study, which dates back to 2008. In my opinion, the study’s empirical results do not allow for any strong conclusions about whether Georgia’s expanded pre-K program passes or fails a benefit-cost test. There simply is too much noise in the estimates to allow for any precise conclusions about either average benefits of pre-K in Georgia, or the distribution of benefits.  In addition, a full benefit-cost analysis would require additional information or much stronger assumptions.

Proponents of universal versus targeted pre-K can both point to selected estimates in Fitzpatrick’s study to support their positions. It is not obvious which estimates in the study are best, so no definitive conclusions can be reached.

Professor Fitzpatrick’s study is tackling a quite challenging estimation problem. With evidence from one state that moved to widen access to pre-K, she is trying to estimate the effects of widening pre-K access on 4th grade test scores and other student outcomes.  She is comparing trends in test scores and other 4th grade outcomes, from before and after Georgia significantly widened  pre-K access, with either students in all other states or with students in a selectively weighted average of other states that is chosen to try to match Georgia’s prior trends.

The problem is that there are many other social, economic, and educational trends that affect outcomes for 4th graders in Georgia and other states. It is impossible to control for all these trends. These other forces result in a great deal of noise in the paper’s estimates. Results are quite sensitive to the different statistical techniques she uses to correct for that noise.

Professor Fitzpatrick presents a wide variety of estimates using many statistical techniques, which is an admirably open approach to presenting one’s research.  In interpreting these estimates, it is important to keep in mind that she is estimating impacts for all children in Georgia, or all children in selected groups in Georgia.  But in the time period she considers, Georgia pre-K access only increased to around 50% of all children, while it was also increasing in other states. Therefore, any estimated effects on all children are really attributable to a much smaller differential increase in access to pre-K in Georgia.

In other words, the average effects on one child getting access to pre-K are going to be much greater than the average effects on all children in Georgia, or all children in one group in Georgia.  For example, if we assumed quality pre-K access increased by 50% in Georgia, while staying the same in other states, which would be about the maximum differential Georgia trend, we would have to multiply her raw estimates by two to reflect effects of Georgia pre-K on the extra students getting access to pre-K.  It is these multiplied effects which would have to be compared with the extra costs of providing one student extra access to pre-K

Professor Fitzpatrick’s variety of estimates for “average effects” of Georgia pre-K include the following:

1. Estimates comparing Georgia with all other states imply that Georgia pre-K increases overall Georgia average  4th grade math test scores by an “effect size” of  0.025 and 4th grade reading test scores by an effect size of 0.025 (Table 4, column II). If we multiply this by two, to reflect that only half of Georgia’s children were in pre-K in this period, and translate this into percentile terms, we get that Georgia pre-K might increase 4th grade test scores by about two percentiles.  In the simplest specification, these effects are statistically significantly different from zero.

2.  But when she statistically corrects for the problems because only one state is considered to have changed its policy during this time period, these estimates are no longer statistically significantly different from zero (Table 4, column IV). This more sophisticated procedure, developed originally by Conley and Taber, allows for the fact that if we just observe one state that changes its policy, there could be chance events that affect its test scores. The chance effects are not just due to sampling error due to a limited sample size, but also systematic shocks to Georgia’s test scores from a wide variety of educational, economic, and social forces.  The true effects of Georgia pre-K could be zero, or could be 3 or 4 times as great as Fitzpatrick’s baseline estimates in column II of Table 4.  If we use the maximum plausible estimates from this revised procedure, and multiply by two to get effects on the child getting increased pre-K access, we find that Georgia pre-K might increase 4th grade test scores by about 7 percentiles. If we use the midpoint of the 90% confidence interval in Table 4, column IV, and multiply by two, we get that Georgia pre-K might be expected to increase 4th grade test scores by about 3 percentiles.

3.  When Professor Fitzpatrick instead compares Georgia not with all other states, but with a weighted average of other states, with weights chosen to match Georgia’s test score trends prior to widening pre-K access, then effects are smaller. The point estimates in Table IV, column VI, when multiplied by two, imply an increase in 4th grade test scores for students getting access to pre-K of about one percentile. However, the confidence intervals on these estimates are very wide when Professor Fitzpatrick uses the procedures recommended by Abadie et al.  for doing such estimates.  They would probably also be wide using the procedures recommended by Conley and Taber for calculating standard errors.

So what is the true average effect of Georgia pre-K on 4th grade test scores for the children who get access to pre-K because of the pre-K expansion:  2 percentiles, zero percentile, 7 percentiles, 3 percentiles, or 1 percentile?  We simply don’t know, because there are advantages and disadvantages to all these approaches. Furthermore, the confidence intervals are wide enough that it is hard to make any of these estimates precise.

Yet this makes a big difference. Research by Chetty et al. suggests that a 1 percentile increase in 4th grade test scores might increase the present value of lifetime earnings by about $2,000. Georgia pre-K costs about $4,000 per student. (This uses similar techniques to what was used in Bartik, Gormley and Adelstein to predict future earnings effects, but uses Chetty et al.’s estimates in their Appendix Table V of the effects of 4th grade test scores on future earnings.) So, if we rely solely on the 4th grade test scores’ effects on future earnings, we need a test score effect at 4th grade on Georgia pre-K participants of about 2 percentiles for Georgia pre-K to pass a benefit-cost test.

But plausible estimates from Professor Fitzpatrick of this 4th grade test score effect range from zero effect to a 7 percentile effect. Therefore, relying only on this information on average effects on 4th grade test scores, Professor Fitzpatrick’s estimates imply that Georgia pre-K might have zero earnings benefits, or might have earnings benefits of over three times its costs.

Furthermore, as Barnett has pointed out, and as was discussed in Bartik, Gormley, and Adelstein, relying only on earnings benefits of pre-K overlooks the anti-crime benefits of pre-K. We simply have no basis for estimating such benefits in Georgia. But if there are any such benefits, then these test score based projections of future earnings effects may significantly understate total social benefits.

In addition, research by Chetty et al., Deming, and others suggest that estimated effects of early childhood programs on 4th grade test scores may significantly understate long-run benefits for earnings. This understatement may be due to difficult-to-measure effects of early childhood programs in improving social skills (“soft skills”).

For example, in Chetty et al., the early childhood intervention they consider, which is an improvement in kindergarten “class quality”, has effects that decline during grade school before reappearing in surprisingly large effects on adult earnings.  Chetty et al. find that their measure of kindergarten class quality causes effects on end-of-kindergarten test scores that would be expected to increase adult earnings for persons in their late 20s by about $600 per year (see Figure VI of Chetty et al. paper). But kindergarten class quality’s effects on 4th grade test scores are small enough that adult earnings would be predicted to increase by less than $100 per year.  The actually observed effect on adult earnings is about $500 per year. 4th grade test score effects understate the effects on adult earnings by a factor of five.

If we assumed the same understatement would apply to the Georgia data, then we would only need an average  test score effect at 4th grade of about 0.4 percentiles (=2 percentiles divided by five) for Georgia pre-K to have benefits in increased present value of adult earnings that exceed costs.

The Wall Street Journal cites Professor Fitzpatrick’s paper as showing that Georgia’s pre-K program failed a benefit-cost test:

“She…concluded that the costs outweighed the benefits by a ratio of six to one”.

But the Wall Street Journal’s interpretation is not warranted by Fitzpatrick’s empirical results.

First, the six to one ratio that is reported by Professor Fitzpatrick is the ratio of program costs to the TAXES generated by her estimate of increased lifetime earnings. But her reported ratio is a fiscal impact calculation, not a benefit-cost analysis. The point of making educational and social investments is not to make money for the government, although that is a nice eventuality in the rare cases where it happens. The point is to maximize total social benefits minus costs summed over everyone in society. This would include the earnings benefits of pre-K for former participants, including the untaxed portion of those benefits.  It would also include anti-crime benefits.

If one uses Professor Fitzpatrick’s assumptions to project total earnings benefits, the cost to benefit ratio becomes 1.6 to 1 rather than 6 to 1. (This calculation divides Professor Fitzpatrick’s figure for tax receipts on p. 31 by her assumption of a 30% tax rate to get total earnings benefits. Professor Fitzpatrick mentions in the paper the qualitative result that costs exceed her estimates of total earnings benefits.) Or, in other words, for every dollar invested, the program increases the present value of earnings by about $0.60 (= one over 1.6). If anti-crime benefits are added in, benefits could well exceed costs.

Second, I believe that the earnings extrapolation done by Professor Fitzpatrick may well lead to some understatement of earnings benefits.  Her earnings benefits use estimates from her study that imply  that average 4th grade Georgia test scores increase due to Georgia pre-K by about 3.5 percentiles. (This is using her assumption of an effect size of 0.09 for 40% of the population, yielding an average effect size of 0.044, which then is multiplied by 2 to reflect that the program enrollment increase is at most half the population, and then translating this effect to percentile effects.) Based on Chetty et al., I project that such a 4th grade test score effect would be expected to increase adult earnings by a present value of around $6,700. This is over one and a half times the cost per participant of Georgia’s program of around $4,000. If, as argued above, there is some fading of test scores effects after kindergarten, followed by larger adult earnings effects,  earnings effects might be five times as great as that $6,700 figure.

Why do I get different earnings effects estimates than Professor Fitzpatrick, starting with the same 4th grade test score effects? (She estimates that earnings benefits are 0.6 as great as costs, whereas I estimate that earnings benefits are 1.7 times costs, so my estimated earnings benefits are over 2.5 times her estimates.) Two reasons. First, she uses Murnane et al.’s estimates of how high school graduates’ test scores affect wages, whereas I use estimates from Chetty et al. of how 4th grade test scores affect earnings.  Using 4th grade test score effects on adult earnings better matches what the Georgia study provides, which is estimated effects on 4th grade test scores.

Second, her calculations appear to assume that higher test scores have the same dollar effects on hourly wage rates at all ages.  However, I assume that higher test scores have the same percentage effect on earnings at all ages. Assuming constant percentage effects on wages and earnings is a more typical assumption by labor economists.  This assumes that educational and skill advantages tend to translate into greater dollar earnings effects when individuals are in their prime earnings years, which seems to be backed by research.  In fact, the Murnane et al. research she cites assumes that test scores have constant percentage effects rather than constant dollar effects on wages.

Which set of assumptions is right? Without actual data on adult earnings of former Georgia pre-K participants, it is hard to know for sure.

Certainly we can agree that such estimates are quite sensitive to a variety of extrapolation techniques. As Professor Fitzpatrick states,

“Such a calculation is difficult because the long-term impacts of the program on wages are not yet known…This is a very simple cost benefit analysis and should therefore be interpreted with caution.”

The Wall Street Journal has not interpreted Fitzpatrick’s estimates with caution.

Finally, the Wall Street Journal includes the following quotation, although it is not identified as being from Professor Fitzpatrick:

“Nearly 80% of enrollment is “just a transfer of income from the government to families of four year olds” who would have attended preschool anyway.”

I have been unable to find a quotation to this effect in Professor Fitzpatrick’s writing. However, Professor Fitzpatrick does at several points imply that she believes that her research evidence is consistent with the notion that universal pre-K’s benefits are targeted within certain groups.  Her interpretation of her study’s results is that when she estimates effects by income group and geographic area, she obtains a more consistent pattern of statistically significant estimates for needier children in rural and “urban fringe” areas compared to less needy children in urban areas.

 “The results in Table 6 suggest that disadvantaged Caucasian children in rural and urban fringe areas are those most likely to gain from Universal Pre-K availability. The math scores of these children increase by 6 to 9 percent of a standard deviation. Their reading scores increase by 3 to 7 percent of a standard deviation and they are at least 2 percentage points more likely to be on-grade for their age. Though the effects are not as consistently statistically significant, there is also a pattern in the results suggesting that other children in rural and urban fringe areas had improved academic achievement related to the program’s availability. The math scores of NSLP-ineligible Caucasian students went up by 4 to 9 percent of a standard deviation. Rural African-American students who are ineligible for NSLP score 5 percent of a standard deviation higher on math tests. African-American disadvantaged students in rural areas score 12 percent of a standard deviation higher on reading tests in fourth grade because of the program’s availability. Additionally, almost all students in rural areas are more likely to be on-grade for their age (the exception is Caucasians who are not eligible for NSLP) as are disadvantaged students in urban fringe areas.”

“Gains in the academic achievement of children living in urban areas also were seen. For example, African-American children in urban areas who are ineligible for the NSLP score 8.7 percent of a standard deviation higher on reading tests and are 6.8 percentage points more likely to be on-grade because of Universal Pre-K availability. African-American children who are eligible for the  NSLP in urban areas are also 7 percentage points more likely to be on-grade for their age. Lastly the test scores of Caucasian children in urban areas who are ineligible for NSLP increased by 2 percent of a standard deviation. However, it is difficult to make conclusions from these results for children in urban areas because the increases were not more consistent across outcomes.”

Several aspects should be noted from this discussion, and from Table VI. First, based on previous analyses in this paper, the standard errors of all these estimates are probably understated, and the statistical significance overstated. As was shown earlier in the paper, if we allow for the statistical noise due to the possibility of random shocks to Georgia’s children’s achievement, our confidence that the estimates are precise is diminished.  Therefore, it is difficult to know what to make of relative counts across different groups of statistically significant coefficients, when those counts are based on standard errors that are understated.

Second, even if we accept these standard errors, and just count statistically significant coefficients in Professor Fitzpatrick’s Table VI, there are plenty of statistically significant effects for non-needy children and in urban areas.  As she notes, white children and African-American children in urban areas who are ineligible for subsidized lunch (she uses the short-hand NSLP) show some statistically significant test score effects.  For example, non-disadvantaged African-American children in urban areas have the second-highest estimated reading test score effect (out of 12 groups considered, differentiated by race, income, and geographic location) from the Georgia pre-K program.

Third, we would not necessarily expect consistently statistically significant effects across all outcomes even if the Georgia pre-K program has positive effects for all groups.  As Steve Barnett has emphasized, effects of a preschool program in reducing grade retention (or, in Professor Fitzpatrick’s terminology, increasing the percentage of students who are on-grade for their age) may mask positive effects on test scores.  If the preschool program increases the percentage of marginal students who are promoted at the appropriate ages to the next grade, this will tend to depress average test scores of those tested in 4th grade. For example, the strongest effect in Professor Fitzpatrick’s estimates for “on grade” is for African-American children in urban areas who are eligible for a subsidized lunch. This group shows no effects of Georgia’s program on test scores. But if we had comparable test score information on all children in all states of the appropriate age, whether or not they were retained in grade, we might well find that Georgia’s program had positive effects on test scores for African-American children in urban areas who are eligible for a subsidized lunch.

Professor Fitzpatrick ends up concluding that universal pre-K increases academic achievement of “disadvantaged children in rural or urban fringe areas [who make] up about 19 percent of the student population in Georgia”, but is more cautious about effects of universal pre-K on “other groups” (non-disadvantaged or in urban areas):

 “Statistically significant gains for other groups of children are also seen on some of the measures of academic achievement but not all, which leads me to be cautious in making any conclusions about the effects of the program for these groups. These first estimates of the longer-term effects of Universal Pre-K support the findings in the literature that gains from Universal Pre-K programs are not universal, but are “targeted” within certain groups.”

I think the first sentence from this quotation is quite consistent with her research evidence: some positive effects for a wide variety of groups, but considerable uncertainty. However, the second sentence only holds if one’s prior belief is that the gains from pre-K only occur for targeted groups.  This second sentence overstates what the literature shows today, as there is a wide variety of research that supports benefits of pre-K for non-disadvantaged groups, as well as disadvantaged groups (e.g., Bartik, Gormley, and Adelstein, as well as Gormley’s previous work; results for Oklahoma and West Virginia in NIEER research; results for Brigham Young University’s preschool program). Of course, not all this research was available when Professor Fitzpatrick’s paper was drafted.

In sum, the Wall Street Journal’s interpretation of the statistical findings from Professor Fitzpatrick’s work is unduly pessimistic.  The estimates have a great deal of uncertainty. They are quite consistent with a wide variety of benefit-cost ratios, including benefit-cost ratios greater than one. And the results provide some evidence for a wide variety of groups benefiting from universal pre-K, not just the disadvantaged in urban fringe and rural areas.

About timbartik

Tim Bartik is a senior economist at the Upjohn Institute for Employment Research, a non-profit and non-partisan research organization in Kalamazoo, Michigan. His research specializes in state and local economic development policies and local labor markets.
This entry was posted in Early childhood programs. Bookmark the permalink.

3 Responses to What does the Fitzpatrick study of Georgia’s “universal” pre-K program show?

  1. Phil Gordon says:

    Tim- Once again, thank you for the very thorough and thoughtful analyses. I wonder if limiting the analyses to income, or even as you suggest including crime prevention is still too limiting. It seems if we include the potential health improvements and therefore public money spent on healthcare, then the economic balance would certainly tip heavily in favor of preK.

    Then there is squishy issue of well-being, (including positive emotion, engagement, relationships, meaning and achievement, see M. Seligman, Flourish , p 24). I would love your thoughtful take on what including well-being as a measure to long term outcomes of early childhood supports could mean (ibid, chapter 10 for an introduction on the economics of well-being).

  2. Pingback: An analysis of the Dalmia/Snell Wall Street Journal article on Georgia and Oklahoma, or the difficulties of case study analysis | investinginkids

  3. Pingback: WSJ Falls Behind on Head Start Coverage | Megan Carolan

Comments are closed.