What does the Fitzpatrick study of Georgia’s “universal” pre-K program show?

During the current debate over expanding pre-K, expansion opponents have sometimes cited a study by Maria Donovan Fitzpatrick of Georgia’s pre-K program.

For example, on February 25, 2013, the Wall Street Journal editorial page described the conclusions of Fitzpatrick’s study as follows:

“Careful work by Maria Donovan Fitzpatrick of the Stanford Institute for Economic Policy Research looked at student achievement in Georgia as its pre-K program phased in. While she found some modest gains, she also concluded that the costs outweighed the benefits by a ratio of six to one. Nearly 80% of enrollment is “just a transfer of income from the government to families of four year olds” who would have attended preschool anyway.”

I’ve re-read Professor Fitzpatrick’s study, which dates back to 2008. In my opinion, the study’s empirical results do not allow for any strong conclusions about whether Georgia’s expanded pre-K program passes or fails a benefit-cost test. There simply is too much noise in the estimates to allow for any precise conclusions about either average benefits of pre-K in Georgia, or the distribution of benefits.  In addition, a full benefit-cost analysis would require additional information or much stronger assumptions.

Proponents of universal versus targeted pre-K can both point to selected estimates in Fitzpatrick’s study to support their positions. It is not obvious which estimates in the study are best, so no definitive conclusions can be reached.

Professor Fitzpatrick’s study is tackling a quite challenging estimation problem. With evidence from one state that moved to widen access to pre-K, she is trying to estimate the effects of widening pre-K access on 4th grade test scores and other student outcomes.  She is comparing trends in test scores and other 4th grade outcomes, from before and after Georgia significantly widened  pre-K access, with either students in all other states or with students in a selectively weighted average of other states that is chosen to try to match Georgia’s prior trends.

The problem is that there are many other social, economic, and educational trends that affect outcomes for 4th graders in Georgia and other states. It is impossible to control for all these trends. These other forces result in a great deal of noise in the paper’s estimates. Results are quite sensitive to the different statistical techniques she uses to correct for that noise.

Professor Fitzpatrick presents a wide variety of estimates using many statistical techniques, which is an admirably open approach to presenting one’s research.  In interpreting these estimates, it is important to keep in mind that she is estimating impacts for all children in Georgia, or all children in selected groups in Georgia.  But in the time period she considers, Georgia pre-K access only increased to around 50% of all children, while it was also increasing in other states. Therefore, any estimated effects on all children are really attributable to a much smaller differential increase in access to pre-K in Georgia.

In other words, the average effects on one child getting access to pre-K are going to be much greater than the average effects on all children in Georgia, or all children in one group in Georgia.  For example, if we assumed quality pre-K access increased by 50% in Georgia, while staying the same in other states, which would be about the maximum differential Georgia trend, we would have to multiply her raw estimates by two to reflect effects of Georgia pre-K on the extra students getting access to pre-K.  It is these multiplied effects which would have to be compared with the extra costs of providing one student extra access to pre-K

Professor Fitzpatrick’s variety of estimates for “average effects” of Georgia pre-K include the following:

1. Estimates comparing Georgia with all other states imply that Georgia pre-K increases overall Georgia average  4th grade math test scores by an “effect size” of  0.025 and 4th grade reading test scores by an effect size of 0.025 (Table 4, column II). If we multiply this by two, to reflect that only half of Georgia’s children were in pre-K in this period, and translate this into percentile terms, we get that Georgia pre-K might increase 4th grade test scores by about two percentiles.  In the simplest specification, these effects are statistically significantly different from zero.

2.  But when she statistically corrects for the problems because only one state is considered to have changed its policy during this time period, these estimates are no longer statistically significantly different from zero (Table 4, column IV). This more sophisticated procedure, developed originally by Conley and Taber, allows for the fact that if we just observe one state that changes its policy, there could be chance events that affect its test scores. The chance effects are not just due to sampling error due to a limited sample size, but also systematic shocks to Georgia’s test scores from a wide variety of educational, economic, and social forces.  The true effects of Georgia pre-K could be zero, or could be 3 or 4 times as great as Fitzpatrick’s baseline estimates in column II of Table 4.  If we use the maximum plausible estimates from this revised procedure, and multiply by two to get effects on the child getting increased pre-K access, we find that Georgia pre-K might increase 4th grade test scores by about 7 percentiles. If we use the midpoint of the 90% confidence interval in Table 4, column IV, and multiply by two, we get that Georgia pre-K might be expected to increase 4th grade test scores by about 3 percentiles.

3.  When Professor Fitzpatrick instead compares Georgia not with all other states, but with a weighted average of other states, with weights chosen to match Georgia’s test score trends prior to widening pre-K access, then effects are smaller. The point estimates in Table IV, column VI, when multiplied by two, imply an increase in 4th grade test scores for students getting access to pre-K of about one percentile. However, the confidence intervals on these estimates are very wide when Professor Fitzpatrick uses the procedures recommended by Abadie et al.  for doing such estimates.  They would probably also be wide using the procedures recommended by Conley and Taber for calculating standard errors.

So what is the true average effect of Georgia pre-K on 4th grade test scores for the children who get access to pre-K because of the pre-K expansion:  2 percentiles, zero percentile, 7 percentiles, 3 percentiles, or 1 percentile?  We simply don’t know, because there are advantages and disadvantages to all these approaches. Furthermore, the confidence intervals are wide enough that it is hard to make any of these estimates precise.

Yet this makes a big difference. Research by Chetty et al. suggests that a 1 percentile increase in 4th grade test scores might increase the present value of lifetime earnings by about $2,000. Georgia pre-K costs about $4,000 per student. (This uses similar techniques to what was used in Bartik, Gormley and Adelstein to predict future earnings effects, but uses Chetty et al.’s estimates in their Appendix Table V of the effects of 4th grade test scores on future earnings.) So, if we rely solely on the 4th grade test scores’ effects on future earnings, we need a test score effect at 4th grade on Georgia pre-K participants of about 2 percentiles for Georgia pre-K to pass a benefit-cost test.

But plausible estimates from Professor Fitzpatrick of this 4th grade test score effect range from zero effect to a 7 percentile effect. Therefore, relying only on this information on average effects on 4th grade test scores, Professor Fitzpatrick’s estimates imply that Georgia pre-K might have zero earnings benefits, or might have earnings benefits of over three times its costs.

Furthermore, as Barnett has pointed out, and as was discussed in Bartik, Gormley, and Adelstein, relying only on earnings benefits of pre-K overlooks the anti-crime benefits of pre-K. We simply have no basis for estimating such benefits in Georgia. But if there are any such benefits, then these test score based projections of future earnings effects may significantly understate total social benefits.

In addition, research by Chetty et al., Deming, and others suggest that estimated effects of early childhood programs on 4th grade test scores may significantly understate long-run benefits for earnings. This understatement may be due to difficult-to-measure effects of early childhood programs in improving social skills (“soft skills”).

For example, in Chetty et al., the early childhood intervention they consider, which is an improvement in kindergarten “class quality”, has effects that decline during grade school before reappearing in surprisingly large effects on adult earnings.  Chetty et al. find that their measure of kindergarten class quality causes effects on end-of-kindergarten test scores that would be expected to increase adult earnings for persons in their late 20s by about $600 per year (see Figure VI of Chetty et al. paper). But kindergarten class quality’s effects on 4th grade test scores are small enough that adult earnings would be predicted to increase by less than $100 per year.  The actually observed effect on adult earnings is about $500 per year. 4th grade test score effects understate the effects on adult earnings by a factor of five.

If we assumed the same understatement would apply to the Georgia data, then we would only need an average  test score effect at 4th grade of about 0.4 percentiles (=2 percentiles divided by five) for Georgia pre-K to have benefits in increased present value of adult earnings that exceed costs.

The Wall Street Journal cites Professor Fitzpatrick’s paper as showing that Georgia’s pre-K program failed a benefit-cost test:

“She…concluded that the costs outweighed the benefits by a ratio of six to one”.

But the Wall Street Journal’s interpretation is not warranted by Fitzpatrick’s empirical results.

First, the six to one ratio that is reported by Professor Fitzpatrick is the ratio of program costs to the TAXES generated by her estimate of increased lifetime earnings. But her reported ratio is a fiscal impact calculation, not a benefit-cost analysis. The point of making educational and social investments is not to make money for the government, although that is a nice eventuality in the rare cases where it happens. The point is to maximize total social benefits minus costs summed over everyone in society. This would include the earnings benefits of pre-K for former participants, including the untaxed portion of those benefits.  It would also include anti-crime benefits.

If one uses Professor Fitzpatrick’s assumptions to project total earnings benefits, the cost to benefit ratio becomes 1.6 to 1 rather than 6 to 1. (This calculation divides Professor Fitzpatrick’s figure for tax receipts on p. 31 by her assumption of a 30% tax rate to get total earnings benefits. Professor Fitzpatrick mentions in the paper the qualitative result that costs exceed her estimates of total earnings benefits.) Or, in other words, for every dollar invested, the program increases the present value of earnings by about $0.60 (= one over 1.6). If anti-crime benefits are added in, benefits could well exceed costs.

Second, I believe that the earnings extrapolation done by Professor Fitzpatrick may well lead to some understatement of earnings benefits.  Her earnings benefits use estimates from her study that imply  that average 4th grade Georgia test scores increase due to Georgia pre-K by about 3.5 percentiles. (This is using her assumption of an effect size of 0.09 for 40% of the population, yielding an average effect size of 0.044, which then is multiplied by 2 to reflect that the program enrollment increase is at most half the population, and then translating this effect to percentile effects.) Based on Chetty et al., I project that such a 4th grade test score effect would be expected to increase adult earnings by a present value of around $6,700. This is over one and a half times the cost per participant of Georgia’s program of around $4,000. If, as argued above, there is some fading of test scores effects after kindergarten, followed by larger adult earnings effects,  earnings effects might be five times as great as that $6,700 figure.

Why do I get different earnings effects estimates than Professor Fitzpatrick, starting with the same 4th grade test score effects? (She estimates that earnings benefits are 0.6 as great as costs, whereas I estimate that earnings benefits are 1.7 times costs, so my estimated earnings benefits are over 2.5 times her estimates.) Two reasons. First, she uses Murnane et al.’s estimates of how high school graduates’ test scores affect wages, whereas I use estimates from Chetty et al. of how 4th grade test scores affect earnings.  Using 4th grade test score effects on adult earnings better matches what the Georgia study provides, which is estimated effects on 4th grade test scores.

Second, her calculations appear to assume that higher test scores have the same dollar effects on hourly wage rates at all ages.  However, I assume that higher test scores have the same percentage effect on earnings at all ages. Assuming constant percentage effects on wages and earnings is a more typical assumption by labor economists.  This assumes that educational and skill advantages tend to translate into greater dollar earnings effects when individuals are in their prime earnings years, which seems to be backed by research.  In fact, the Murnane et al. research she cites assumes that test scores have constant percentage effects rather than constant dollar effects on wages.

Which set of assumptions is right? Without actual data on adult earnings of former Georgia pre-K participants, it is hard to know for sure.

Certainly we can agree that such estimates are quite sensitive to a variety of extrapolation techniques. As Professor Fitzpatrick states,

“Such a calculation is difficult because the long-term impacts of the program on wages are not yet known…This is a very simple cost benefit analysis and should therefore be interpreted with caution.”

The Wall Street Journal has not interpreted Fitzpatrick’s estimates with caution.

Finally, the Wall Street Journal includes the following quotation, although it is not identified as being from Professor Fitzpatrick:

“Nearly 80% of enrollment is “just a transfer of income from the government to families of four year olds” who would have attended preschool anyway.”

I have been unable to find a quotation to this effect in Professor Fitzpatrick’s writing. However, Professor Fitzpatrick does at several points imply that she believes that her research evidence is consistent with the notion that universal pre-K’s benefits are targeted within certain groups.  Her interpretation of her study’s results is that when she estimates effects by income group and geographic area, she obtains a more consistent pattern of statistically significant estimates for needier children in rural and “urban fringe” areas compared to less needy children in urban areas.

 “The results in Table 6 suggest that disadvantaged Caucasian children in rural and urban fringe areas are those most likely to gain from Universal Pre-K availability. The math scores of these children increase by 6 to 9 percent of a standard deviation. Their reading scores increase by 3 to 7 percent of a standard deviation and they are at least 2 percentage points more likely to be on-grade for their age. Though the effects are not as consistently statistically significant, there is also a pattern in the results suggesting that other children in rural and urban fringe areas had improved academic achievement related to the program’s availability. The math scores of NSLP-ineligible Caucasian students went up by 4 to 9 percent of a standard deviation. Rural African-American students who are ineligible for NSLP score 5 percent of a standard deviation higher on math tests. African-American disadvantaged students in rural areas score 12 percent of a standard deviation higher on reading tests in fourth grade because of the program’s availability. Additionally, almost all students in rural areas are more likely to be on-grade for their age (the exception is Caucasians who are not eligible for NSLP) as are disadvantaged students in urban fringe areas.”

“Gains in the academic achievement of children living in urban areas also were seen. For example, African-American children in urban areas who are ineligible for the NSLP score 8.7 percent of a standard deviation higher on reading tests and are 6.8 percentage points more likely to be on-grade because of Universal Pre-K availability. African-American children who are eligible for the  NSLP in urban areas are also 7 percentage points more likely to be on-grade for their age. Lastly the test scores of Caucasian children in urban areas who are ineligible for NSLP increased by 2 percent of a standard deviation. However, it is difficult to make conclusions from these results for children in urban areas because the increases were not more consistent across outcomes.”

Several aspects should be noted from this discussion, and from Table VI. First, based on previous analyses in this paper, the standard errors of all these estimates are probably understated, and the statistical significance overstated. As was shown earlier in the paper, if we allow for the statistical noise due to the possibility of random shocks to Georgia’s children’s achievement, our confidence that the estimates are precise is diminished.  Therefore, it is difficult to know what to make of relative counts across different groups of statistically significant coefficients, when those counts are based on standard errors that are understated.

Second, even if we accept these standard errors, and just count statistically significant coefficients in Professor Fitzpatrick’s Table VI, there are plenty of statistically significant effects for non-needy children and in urban areas.  As she notes, white children and African-American children in urban areas who are ineligible for subsidized lunch (she uses the short-hand NSLP) show some statistically significant test score effects.  For example, non-disadvantaged African-American children in urban areas have the second-highest estimated reading test score effect (out of 12 groups considered, differentiated by race, income, and geographic location) from the Georgia pre-K program.

Third, we would not necessarily expect consistently statistically significant effects across all outcomes even if the Georgia pre-K program has positive effects for all groups.  As Steve Barnett has emphasized, effects of a preschool program in reducing grade retention (or, in Professor Fitzpatrick’s terminology, increasing the percentage of students who are on-grade for their age) may mask positive effects on test scores.  If the preschool program increases the percentage of marginal students who are promoted at the appropriate ages to the next grade, this will tend to depress average test scores of those tested in 4th grade. For example, the strongest effect in Professor Fitzpatrick’s estimates for “on grade” is for African-American children in urban areas who are eligible for a subsidized lunch. This group shows no effects of Georgia’s program on test scores. But if we had comparable test score information on all children in all states of the appropriate age, whether or not they were retained in grade, we might well find that Georgia’s program had positive effects on test scores for African-American children in urban areas who are eligible for a subsidized lunch.

Professor Fitzpatrick ends up concluding that universal pre-K increases academic achievement of “disadvantaged children in rural or urban fringe areas [who make] up about 19 percent of the student population in Georgia”, but is more cautious about effects of universal pre-K on “other groups” (non-disadvantaged or in urban areas):

 “Statistically significant gains for other groups of children are also seen on some of the measures of academic achievement but not all, which leads me to be cautious in making any conclusions about the effects of the program for these groups. These first estimates of the longer-term effects of Universal Pre-K support the findings in the literature that gains from Universal Pre-K programs are not universal, but are “targeted” within certain groups.”

I think the first sentence from this quotation is quite consistent with her research evidence: some positive effects for a wide variety of groups, but considerable uncertainty. However, the second sentence only holds if one’s prior belief is that the gains from pre-K only occur for targeted groups.  This second sentence overstates what the literature shows today, as there is a wide variety of research that supports benefits of pre-K for non-disadvantaged groups, as well as disadvantaged groups (e.g., Bartik, Gormley, and Adelstein, as well as Gormley’s previous work; results for Oklahoma and West Virginia in NIEER research; results for Brigham Young University’s preschool program). Of course, not all this research was available when Professor Fitzpatrick’s paper was drafted.

In sum, the Wall Street Journal’s interpretation of the statistical findings from Professor Fitzpatrick’s work is unduly pessimistic.  The estimates have a great deal of uncertainty. They are quite consistent with a wide variety of benefit-cost ratios, including benefit-cost ratios greater than one. And the results provide some evidence for a wide variety of groups benefiting from universal pre-K, not just the disadvantaged in urban fringe and rural areas.

Posted in Early childhood programs | 3 Comments

Responding to the Wall Street Journal editorial on preschool expansion

On February 26, the Wall Street Journal published an editorial criticizing President Obama’s proposal to expand preschool.  The editorial was entitled “Head Start for All: Universal preschool and a government that won’t admit failure”.  Given the prominence of the Wall Street Journal, this editorial has been widely circulated and cited by critics of proposals for expanding preschool.  In response to several reader requests, I am examining some of the claims made in the editorial.

To begin with, the editorial’s title, “Head Start for All”, does not appear to be an accurate description of President Obama’s proposal.  According to a White House fact sheet, President Obama’s preschool proposal would do the following:

The President’s proposal will improve quality and expand access to preschool, through a cost sharing partnership with all 50 states, to extend federal funds to expand high-quality public preschool to reach all low- and moderate-income four-year olds from families at or below 200% of poverty.  The U.S. Department of Education will allocate dollars to states based their share of four-year olds from low- and moderate-income families and funds would be distributed to local school districts and other partner providers to implement the program.  The proposal would include an incentive for states to broaden participation in their public preschool program for additional middle-class families, which states may choose to reach and serve in a variety of ways, such as a sliding-scale arrangement….”

“Under the President’s proposal, investment in the federal Head Start program will continue to grow.  The President’s plan will maintain and build on current Head Start investments, to support a greater share of infants, toddlers, and three-year olds in America’s Head Start centers, while state preschool settings will serve a greater share of four-year olds. “

Therefore, this proposal differs from “Head Start for All” in several respects:

  1. The proposal is largely funding expansion of state programs, not an expansion of directly federally-funded local Head Start centers.
  2. Under the proposal, many children would actually shift from Head Start centers at age 4 to state-designed programs to serve four-year-olds. It seems likely that there would be much more state and local flexibility under this program than under Head Start, although how much flexibility depends upon yet-to-be-released details in the proposal.
  3. The bulk of the federal funding goes to four-year olds in families below 200% of the poverty line, although it sounds as if some funds may encourage expansion of such programs with some fees to more middle-class families. While the proposal can be seen as expanding targeted preschool to more kids in a dramatic way with federal dollars, the degree to which the program advances truly universal pre-K will largely rest on state initiative and state dollars.

The editorial goes on to argue that the experience of Georgia, which has close to a universal pre-K program, shows that universal pre-K is ineffective:

“Careful work by Maria Donovan Fitzpatrick of the Stanford Institute for Economic Policy Research looked at student achievement in Georgia as its pre-K program phased in. While she found some modest gains, she also concluded that the costs outweighed the benefits by a ratio of six to one. Nearly 80% of enrollment is “just a transfer of income from the government to families of four year olds” who would have attended preschool anyway.”

My comment: I have recently re-read Fitzpatrick’s article, which was published in 2008. I have prepared a fuller analysis of what I think the article shows, but I have first shared this analysis with the author to allow a chance to respond before I post this full analysis. In brief, I believe the Wall Street Journal has significantly misinterpreted what the empirical findings of Fitzpatrick’s article show, in several respects:

(1)    The six to one ratio for costs to benefits is for a fiscal impact analysis that only counts gains in government revenue (from the increased earnings of former preschool participants) as benefits.  Yet a full benefit-cost analysis of any public policy should consider effects on all groups in society, not just the government. If one adds in benefits of preschool for after-tax earnings of former preschool participants, as well as social benefits from reduced crime, benefits could well exceed costs.

(2)    Fitzpatrick’s paper presents a wide range of estimated preschool effects using different methodologies, and these estimates vary widely in what they show about the plausible range of effects of Georgia’s preschool program. Depending upon what estimates are used, one can get widely varying benefit-cost ratios for Georgia’s preschool program, which differ in their bottom line conclusion about whether benefits exceed costs.

(3)     Fitzpatrick’s estimates include significant effects on both disadvantaged children and non-disadvantaged children.

The Wall Street Journal goes on to look at national trends and evidence on preschool:

“Nationwide today about 1.3 million kids, or 28% of all four-year-olds, attend state-funded pre-K, a leap from 14% in 2002. The empirical case for this expansion—the evidence that universal preschool “works,” as Mr. Obama put it—rests on two academic studies, the Abecedarian and Perry projects, conducted four and five decades ago.”

My comment: The empirical case for “universal” preschool does not rest solely on Abecedarian and Perry. These studies are targeted and small-scale, so they hardly could provide a complete case by themselves.

The empirical case for “large-scale” preschool – which might be either targeted on the disadvantaged, or universal –rests on many other studies that have looked at large-scale preschool programs, including the Chicago Child-Parent Center program, and many state pre-K programs. These studies show that preschool can work on the large scale in the short-run (the state studies) and in the long-run (the CPC study).

The empirical case for more universal programs rests on state studies that have looked at programs with broad eligibility (notably Oklahoma and West Virginia), as well as studies of Tulsa’s program. My study with Gormley and Adelstein of Tulsa’s universal preschool program explicitly compares short-run effects of preschool on middle-class vs. low-income children. We find similar effects on kindergarten entrance test scores, which would be projected to cause similar dollar effects on adult earnings.

The Wall Street Journal goes on to describe these experiments’ effects and costs, compared to Georgia:

“These experiments showed vast returns on investment, the source of Mr. Obama’s claim that every early education dollar generates $7 down the line. Yet Abecedarian and Perry cost between $16,000 to $41,000 per child per year (in current dollars), the higher end comparable to Ivy League tuition. Georgia spends $4,298 per child.”

“The extra money was required because these were very intensive interventions that included home visits, parent counseling, nutrition, health care and other social services. They were micro-enterprises run by the most experienced early education experts and impossible to replicate. Mr. Obama is simply pocketing their results and pretending that this can be extrapolated to the entire population. It can’t even be replicated in Georgia.”

My comment: The implication that we cannot get strong results even with cheaper programs that are focused on preschool services is incorrect. Most of the state and local preschool programs that have shown short-run and long-run benefits cost about $5,000 per year per child. For example, my Tulsa study with Gormley and Adelstein found favorable results for a half-day preschool program for 4-year-olds that cost $5,080 per student in 2012 dollars.  Georgia might be under-funded compared to what one would prefer for a high-quality preschool program, but the additional needed funding might be no more than another 20% per student.

The Wall Street Journal then goes on to claim that:

“…What “study after study” really suggest is that government-funded pre-K programs are best when they are targeted at low-income, disadvantaged or minority children—those with the most need. Such a modest, practical reform may lack Mr. Obama’s preferred political grandeur, but the other reason he didn’t propose it is that the government has already been doing it for a half-century. “

“That would be Lyndon Johnson’s Head Start program, birth date 1965. In December of last year, the Health and Human Services Department released the most comprehensive study of Head Start to date, which took years to prepare. The 346-page report followed toddlers who won lotteries to join Head Start in several states and those who didn’t through the third grade. There were no measurable differences between the two groups across 47 outcome measures. In other words, Head Start’s impact is no better than random. “

My comment: First, study after study does NOT show that government-funded-pre-K programs are best when targeted. Most studies focus solely on the disadvantaged, and therefore provide no evidence for or against benefits for the non-disadvantaged. These studies do show strong benefits of high-quality programs for the disadvantaged. Those studies that include the non-disadvantaged, as I have cited above, also show benefits for that group as well.

Returns may be higher for the disadvantaged group, and there may be more evidence of benefits for the disadvantaged, but there may still be net benefits from expanding preschool from disadvantaged groups to the middle-class. As I show in chapter 8 of my book, Investing in Kids, even if we assume significantly lower benefits of preschool for the middle class than for the poor, universal preschool may have higher net economic benefits than targeted preschool.  If we want to affect the quality of our entire labor force, then involving middle-class kids in government educational investments is sensible.

Second, I have already extensively commented in a previous blog post on what the recent evidence from Head Start shows, and how it fits in with other previous research.  What the Wall Street Journal overlooks is that there are good long-term studies of Head Start, with good comparison groups, that show long-term benefits of Head Start. One such study, by Deming, shows strong long-term benefits of Head Start even though the study also shows considerable fading of the program’s test score effects as students progress in K-12.  One possible interpretation of this fading and re-emergence is that it is due to program effects on social skills (“soft skills”) that are more difficult to measure with standardized tests.

The Wall Street Journal again:

“Preschool activists explain away such results by claiming that different programs vary enormously in quality. The White House claims fewer than three in 10 kids are in a “high quality” program. Since we don’t live in Lake Wobegon, well, of course. But it turns out that there are even deep disagreements in the early education literature about how to improve quality, or even how to measure quality in a valid, objective, reliable and fair way.”

My comment: Despite disagreements about how to measure and improve preschool quality, it appears that it is possible for places as diverse as Chicago Public Schools and Tulsa Public Schools to run quality preschool programs that produce large benefits.  It is certainly feasible for regular public agencies to run quality programs. And as reviewed in chapter 5 of my book, we do have considerable evidence on what makes for quality in preschool. Quality requires reasonable class sizes, well-trained teachers paid a competitive wage, and a curriculum that emphasizes both academic skills and social skills.

Continuing with the Wall Street Journal:

“Counting Head Start, special education and state-subsidized preschool, 42% of four-year-olds are now enrolled in a government program. Federal, state and local financing for early learning is closing in on $40 billion a year, double what it was a decade ago. But can anyone say that achievement is twice as good—or even as good?”

My comment: Actually, over the last 15 years or so, trends in student achievement at fourth grade, particularly in mathematics, are up. For example, over the period from 1996 to 2011, fourth grade math achievement in the U.S., according to the National Assessment of Educational Progress, has increased by about 0.50 in “effect size” units, which is a gain of about 19 percentiles or about one grade level in performance. Reading scores at 4th grade have gone up about as third as much, with an effect size gain from the mid-1990s to the present of about a 0.15 effect size.

It seems unlikely that all of this trend or even most of this trend has been due to preschool. The 2011 assessment results would reflect students in preschool in 2006. In 2006, 20% of all four-year-olds were in state-funded pre-K programs. The mid-1990s test score results would include students who were in preschool around 1991, when about 12% of all four-year-olds were in state-funded pre-K. Therefore, the increase over this period in four-year-olds in state-funded pre-K is perhaps 8% out of all 4-year-olds. State-funded pre-K might increase achievement by an effect size of 0.35. An increase in achievement by an effect size of 0.35 for an extra 8% of the population would increase overall achievement by an effect size of about 0.03.

Therefore, preschool might explain between 5% and 20% of the test score improvement for 4th-graders since the mid-1990s. Why not more? In large part, because while preschool participation increased by 75% over this time period,  the percentage of all 4 year olds in state-financed pre-K still only increased by 8% of all four-year-olds over this time period. The increase in state pre-K is large in terms of program budget increases, but is still modest compared to the size of the entire population.

In sum, there is considerable research evidence that large-scale state-funded pre-K programs can be effective for many students.  Although many details remain to be explained, President Obama’s proposal seems to aim at expanding these state pre-K programs, not at implementing Head Start for all.  Preschool can not by itself solve all social problems. But if implemented in a high-quality manner on a large-scale, it can make a significant difference in helping develop greater opportunities for many children.

Posted in Early childhood programs | 2 Comments

Steve Barnett’s take on what the facts show about pre-K

Steve Barnett, Director of the National Institute for Early Education Research, has written a very useful brief report about what research shows about pre-K programs.

This report, which is 15 pages long, is entitled “Getting the Facts Right on Pre-K and the President’s Pre-K Proposal”. The report considers four questions:

“1. Does high-quality pre-K have lasting benefits?
2. What is the evidence for the $7 to $1 return on investment in pre-K?
3. Do non-disadvantaged children benefit from pre-K, and is a targeted or a
universal approach to pre-K more effective?
4. Are large-scale public programs, including Head Start, effective?”

There’s a lot of overlap between what Barnett says and my recent posts responding to various critics of large-scale pre-K.  I think we largely agree on how we see the research evidence.  I would hope that both my specific responses to individual articles, as well as Barnett’s overview of the research, would be useful to a variety of readers.

Posted in Early childhood programs | 1 Comment

Link to post on Charles Murray’s analysis of large-scale pre-K

As far as I can tell, my post yesterday was not emailed to subscribers. I am not sure if this is a glitch in the wordpress.com software or servers, or is due to the length of yesterday’s post. So I am writing this new post, and hoping it will be emailed to subscribers.

Yesterday’s post was a point-by-point analysis of Charles Murray’s recent article on large-scale pre-K programs. It was written in response to a reader request.

Yesterday’s post can be found here.

Posted in Uncategorized | 4 Comments

An analysis of Charles Murray’s critique of Obama’s proposal for expanded pre-K

In response to a reader request, I am taking a closer look at a recent article by Charles Murray, entitled “The Shaky Science Behind Obama’s Universal Pre-K”. The article was published on February 20, 2013 by Bloomberg News. Charles Murray is a well-known political scientist who is a scholar at the American Enterprise Institute, and has written widely on many topics, including his argument that many of the problems of poverty have to do with breakdowns in the American family.

Murray’s main point of the article is stated upfront:

“There are just two problems with [Obama’s proposal for expanded pre-K]: The evidence used to support the positive long-term effects of early childhood education is tenuous, even for the most intensive interventions. And for the kind of intervention that can be implemented on a national scale, the evidence is zero.”

My comment: This statement is incorrect, as I will demonstrate below. There is extensive evidence for expanded pre-K education, even for universal pre-K, from studies with large sample sizes of programs that have already been implemented on a large-scale.

Murray then goes on to highlight the Perry Preschool program and the Abecedarian program.  He criticizes these studies first on two grounds:

 “The main problem is the small size of the samples [for these two programs]. .. Another problem is that the evaluations of both Perry Preschool and Abecedarian were overseen by the same committed, well-intentioned people who conducted the demonstration projects.”

My comment: Murray chooses to focus on two programs with small sample sizes and with evaluations run by the researchers who set up the program. Murray overlooks preschool programs that have evaluations with large sample sizes that were conducted by outside researchers.

For example, if we want evidence for preschool’s long-run effects, we can look at the many evaluations done of the Chicago Child-Parent Center program. This program’s evaluations rely on sample sizes of over 1400 children, over ten times the sample size of Perry or Abecedarian. And these CPC evaluations were done by outside researchers. These evaluations of CPC have found strong long-run benefits, with an estimated benefit-cost ratio of over 10 to 1.

If we want to look at evidence for preschool’s short-run or medium-run effects, we have many studies with large sample sizes conducted by outside researchers. These include many studies of state pre-K programs conducted by researchers at the National Institute for Early Education Research. These also include studies of Tennessee’s pre-K program conducted by researchers at Vanderbilt, and studies of North Carolina’s program conducted by researchers at Duke. Finally, Bill Gormley and his colleagues have done a series of studies of the effects of Tulsa’s pre-K program.  All of these studies have found significant benefits of high-quality pre-K programs.

These studies typically look at short-term or medium-term effects of pre-K. However, they do project long-term benefits based on the expected relationships between short-run test score gains and long-term effects on adult outcomes.  For example, my recent study of Tulsa with Gormley and Adelstein projected that per dollar invested in pre-K, the present value of earnings would increase by $3 or $4.  These large benefit-cost ratios held for both half-day and full-day pre-K programs at age 4, and for both low-income and middle-class kids. This study relied on a large sample size of over 2500 children, which is over 20 times the sample size of Perry or Abecedarian. And none of us researchers have anything to do with designing or running Tulsa’s pre-K program.

Murray gets quite detailed about his concerns with the small sample size of Perry and Abecedarian:

“The main problem is the small size of the samples. Treatment and control groups work best when the numbers are large enough that idiosyncrasies in the randomization process even out. When you’re dealing with small samples, even small disparities in the treatment and control groups can have large effects on the results. There are reasons to worry that such disparities existed in both programs.”

My comment: What Murray overlooks is that common statistical procedures incorporate the imprecision from randomness with small sample sizes by making the confidence intervals for any estimated effects much larger. As Nobel prize-winning economist James Heckman has pointed out, the small sample size and the resulting large confidence intervals mean that we have to have very large effects in Perry and Abecedarian to have any statistically significant results:

“Charles Murray has made that claim [about small sample size] most recently, and others make it too… [But] a small sample would actually work toward not finding anything. You have a limited number of observations. You would argue that the statistical observations would not be very great, and there would not be much of them. There are methods that account for the small sample size. Size doesn’t matter. It holds up. There’s a lot of robustness here…”

Furthermore, if one is worried about “insiders” doing the research, or about problems with the randomization process, it should be reassuring that Heckman, a prominent “outside researcher”, has reanalyzed the data from Perry and found that the results from the original research hold up, even after we account for some problems in the initial randomization process.  Heckman won his Nobel Prize in large part due to his research in how to overcome “selection bias” in evaluating the effects of public policies.

But Murray states that his main reason for thinking that Perry and Abecedarian only provide tenuous evidence is that he believes that they failed in a replication with a larger sample size:

“The most concrete reason for doubting the wider applicability of the Perry Preschool and Abecedarian effects is this: A large-scale, high-quality replication of the Abecedarian approach failed to achieve much of anything. Called the Infant Health and Development Program, it was begun in 1985. Like Abecedarian, IHDP identified infants at risk of developmental problems because of low birth weight and supplied similarly intensive intervention. Unlike Abecedarian, IHDP had a large sample (377 in the treatment group, 608 in the control group) spread over several sites assessed by independent researchers. IHDP provided a level of early intervention that couldn’t possibly be replicated nationwide, but it gave us by far the most thorough test of intensive early intervention to date.”

My comment: This is a strange critique of Obama’s proposal for expanding state-funded pre-K at age 4.  The IHDP provided home visits from birth to age 3, and provide high-quality child care/preschool at ages 1 and 2. However, the program did not provide preschool at ages 3 or 4, so it is hard to see how it is particularly relevant to a proposal to expand preschool at age 4.

Furthermore, the IHDP differed significantly from Abecedarian in many respects, including that Abecedarian included full-time child care and preschool from birth until the children were age 5. In addition, Abecedarian was targeted at high-risk children, whereas the IHDP was targeted at low-birth-weight children.  Although IHDP used the Abecedarian curriculum in child care, the rest of the program was quite different, and it had a very different target group, so it is hardly a close replication of Abecedarian.

In addition, Murray’s negative spin on the effects of IHDP are not shared by the research he cites on the program. As of the age 18 follow-up, these researchers conclude that

“The results of this phase of the IHDP suggest a persistent benefit of the intervention for the subset of HLBW [heavier low-birth-weight] participants and absence or even reversal of any intervention effect for the youth born weighing less than or equal to 2000 g.” 

In other words, the program seemed to have statistically significant positive effects on test scores at age 18 for the low-birth-weight participants who were closer to normal birth weights, and therefore more similar to the bulk of the Abecedarian sample.  The researchers went on to suggest that the lack of an effect of the program in the “Lighter Low-Birth-Weight” (LLBW) group (less than 2000 g) might be due to less participation by very low-birth-weight participants in the center-based child care/preschool program at ages 1 and 2.

In addition, the researchers note that as of age 18, they can’t really analyze educational outcomes, unlike other studies.  It also would be impossible at age 18 to directly estimate long-run earnings effects.

Furthermore, they note that in the HLBW groups, the point estimates for benefits in reducing special education costs are similar to the Chicago Child-Parent Center program, although because the HLBW group is less than half the overall sample, the estimates are imprecisely estimated and are not statistically significant. In the Chicago CPC program, these benefits in reducing special education costs are over $5,000 per participant.

The same statistical insignificance occurs for anti-crime effects for the HLBW group in IHDP, although the point estimates for reducing crime are about half those in the CPC study. In the Chicago CPC study, the anti-crime benefits alone had a present value of over $40,000 per participant, so the point estimates in IHDP also point to very large anti-crime benefits, although they are inconclusive because of low sample size.

In other words, it is fair to say that IHDP finds no evidence of long-run benefits for former child participants who started out as “lighter” low-birth weight infants.  But the program does find benefits for heavier low-birth-weight infants. But for this group, the study runs into sample size problems which make it difficult to provide statistically significant estimates for some effects even when the point estimates are consistent with large benefits.    

Finally, if we are going to evaluate early childhood programs in part for what they do for parents, IHDP does show significant effects in boosting maternal employment. When former child participant are age 18, over 15 years after IHDP stopped providing child care services, the mothers in the program group are significantly more likely to be employed. These effects at age 18 are only statistically significant in the lighter low-birth-weight group, for whom the effect is to boost employment rates when their child is age 18 from 73% to 86%, which is quite sizable.  In my examination of the benefits of the Abecedarian program in boosting state residents’ earnings per capita, I found that more than half the earnings benefits of the program came from effects in boosting parents’ earnings short-term and long-term. The Abecedarian program could pass a benefit-cost test based solely on effects on parental earnings.

Murray does concede that early education programs can work:

“The disappointing results from the IHDP don’t mean that early education can’t do any good. Other studies of good technical quality have convinced me that the best early education programs sometimes have positive long-term effects, though much more modest than the ones ascribed to Perry Preschool and Abecedarian.”

My comment: I agree that other preschool programs probably have smaller long-term effects than Perry and Abecedarian.  However, “much more modest” seems a bit of an over-statement.  Adult earnings effects for former child participants are about 19% for Perry and about 14% for Abecedarian (see Bartik, Gormley, and Adelstein for sources for these calculations). But adult earnings effects for the Chicago Child-Parent Center are around 7%.  And projected adult earnings effects for Tulsa for “free lunch” children are 7% for a half-day program at age 4, and 10% for a full-day program at age 4. Increasing average earnings by 7 to 10% is more than a modest effect.

Furthermore, benefit-cost ratios are not necessarily lower for programs other than Perry and Abecedarian.  Perry cost over $17,000 per participant, and Abecedarian cost almost $40,000 per participant, compared to a little over $5,000 per year per participant for the Chicago Child-Parent Center program, and around $4500/$9,000 for a Tulsa half-day/full-day program. (These figures are in 2005-2006 prices, and come from Bartik, Gormley, and Adelstein.  The CPC figures are for a one-year program, which was the pattern for 55% of the study participants, and the one-year participants had a higher benefit-cost ratio.) So programs that invest less get lower percentage earnings effects, which is not surprising.   In my calculations of effects on state residents’ earnings per capita, a universal pre-K program modeled after CPC, and similar to Tulsa, has a higher benefit-cost ratio than the Abecedarian program.   

But Murray goes on to claim that the best early education programs are not scalable:

“That leaves us with one last problem: None of those first-rate programs are replicable on a large scale. The kind of nationwide expansion of early education that Obama wants won’t have the highly motivated administrators and hand-picked staffs that demonstration projects enjoy, and the per-child cost of the interventions on the Perry Preschool and Abecedarian model are prohibitively high. If you’re going to have a national program, you’re going to get the kind of early education that Head Start provides.”

My comment: Murray doesn’t say what “other studies” he’s including beyond Perry and Abecedarian. However, this statement ignores that many “first-rate programs” that have been evaluated have already been implemented on a large-scale, without “hand-picked” administrators and staff. This includes the Chicago program, as well as the various state programs, such as the Oklahoma program that funds Tulsa’s program. If we’re going to have a national pre-K program for 4-year olds that is primarily focused on kindergarten readiness in terms of both cognitive skills and social skills, we can choose to model that program after these large-scale successful state and city programs.

Furthermore, these large-scale programs have less than one-third the cost of Perry and perhaps one-eighth the cost of Abecedarian (see cost figures above).  These costs are not prohibitively high. At about $5,000 per participant per year, I have estimated that a high-quality half-day pre-K program for 4-year-olds that was universal might cost $14 billion annually. This is around $50 per U.S. resident, which is affordable either for the federal government or for state governments.

In other words, a national program need not be modeled after Head Start in design or costs, but rather can follow these successful and affordable state and city models for pre-K services.  

Murray then goes on to summarize the recent third-grade follow-up results of the national Head Start experiment:

“Of the 47 outcome measures reported separately for the 3- year-old and 4-year-old cohorts that were selected for the treatment group, 94 separate results in all, only six of them showed a statistically significant difference between the treatment and control group at the .05 level of probability — just a little more than the number you would expect to occur by chance. The evaluators, recognizing this, applied a statistical test that guards against such “false discoveries.” Out of the 94 measures, just two survived that test, one positive and one negative.”

My comment: I’ve already commented extensively on Head Start in several blog posts. Without repeating all that analysis in full detail, there are two things that this summary overlooks:

First, the Head Start study is implicitly comparing effects of Head Start with the effects of whatever activities were engaged in by the control group.  This included preschool. According to the latest report, “Approximately 60 percent of the control group children participated in child care or early education programs during the first year of the study, with 13.8 percent of the 4-year-olds in the control group and 17.8 percent of the 3-year-olds in the control group finding their way into Head Start during the year.”

If some of these alternative preschool programs are highly effective state or local pre-K programs, this may significantly reduce any net Head Start effect. However, such a lower net Head Start effect does not imply that preschool doesn’t work compared to no preschool.

Second, this summary ignores that some good Head Start studies have found significant fade out of test score effects of Head Start, followed by a bounceback of benefits at older ages and adulthood. For example, Deming’s study of Head Start found that initial effects of Head Start on test scores at ages 5 and 6 faded by 60% by ages 11-14. But these effects were still consistent with much larger effects on adult outcomes, which would predict adult earnings effects of Head Start of about 11%.

Murray goes on to make a somewhat puzzling emphasis on one aspect of the Head Start study:

“One aspect of the Head Start study deserves elaboration. The results I gave refer to the sample of children who were selected to be part of the treatment group. But 15 percent of the 3-year-old cohort and 20 percent of the 4-year-old cohort were no-shows — a provocative finding in itself. When the analysis is limited to children who actually participated in Head Start, some of those outcomes do become statistically significant, though still substantively small. But keep in mind that we’re looking at selection artifacts: Children who end up coming to the program every day have cognitive, emotional or parental assets going for them that children who fail to participate don’t have. This means that if somehow the no-shows could be forced to attend, you couldn’t expect them to get the same benefit as those who participated voluntarily. If you’re asking what impact we could expect by making Head Start available to all the nation’s children who might need it, you have to make the calculation based on giving access to the service.”

My comment: Murray’s discussion here is puzzling.  We can adjust the Head Start estimates from what are called “Intent to Treat” (ITT) estimates to “Impact on the Treated” (IoT) estimates.  This basically divides the ITT estimates by the difference in the proportion participating in Head Start in the treatment group vs. the control group. This involves blowing up the estimates by about 40 to 50%.  For example, for the 4-year-old cohort, 80% of the treatment group ended up participating in Head Start, vs. 14% of the control group. The difference is 66% in participation in Head Start. We assume that the ITT estimates are solely due to this participation difference, and we therefore divide the ITT estimate for 4-year olds by 0.66 to get the effect of going from no Head Start to Head Start participation.

But contrary to Murray, this has no implications for statistical significance. It simply makes both the estimated effects and standard errors of those effects larger by some percentage. This is noted in the Head Start report on page 89: “There is no change in the statistical significance of the estimates.”     

Murray is right that “Impact on the Treated” estimates reflect effects for people who choose to participate in Head Start, and may not translate into effects on children forced to participate in Head Start. But it is unclear what relevance this would have to some hypothetical program that would expand voluntary access to Head Start. No one is proposing mandatory preschool.

Murray then summarizes his case as follows:

“The take-away from the story of early childhood education is that the very best programs probably do a modest amount of good in the long run, while the early education program that can feasibly be deployed on a national scale, Head Start, has never proved long-term results in half a century of existence.”

My comment: As stated above, there are many proven large-scale pre-K programs that are not Head Start and that show much more than modest benefits in the long-term.

In addition, Murray overlooks the many rigorous Head Start studies that show long-term benefits, including studies from Deming, Ludwig/Miller, and Garcia/Thomas/Currie. I’ve discussed this evidence in previous blog posts

Murray might respond that these other studies are not random assignment experiments. But they have very good comparison groups for Head Start participants. By “good comparison groups”, I mean that the non-participants in Head Start are likely to be quite similar in observed and unobserved characteristics to the Head Start participants.  Deming and Garcia/Thomas/Currie compare siblings who differ in Head Start participation.  Ludwig/Miller compare counties that differed in whether they received help from the federal government in preparing their Head Start application back in the 1960s, based on whether the county was below or above some poverty threshold for such assistance.   These are rigorous methodologies.

While random assignment would be ideal in a world with infinite resources and time, random assignment is expensive and cumbersome, and by definition takes a long time to get long-term results. We should not throw away the results of other rigorous studies just because they lack random assignment.

Murray then goes on to summarize his case more bluntly:

“Let me rephrase this more starkly: As of 2013, no one knows how to use government programs to provide large numbers of small children who are not flourishing with what they need. It’s not a matter of money. We just don’t know how.”

My comment: For all the reasons outlined above, I think this is incorrect.  Many state and local areas are already implementing large-scale pre-K programs that have good evidence for both short-run and long-run benefits.

Your average American state government, or local school district, can successfully carry out large-scale preschool programs. To do so, that state or local government agency must be willing to spend a reasonable amount per student, have well-trained and paid teachers, have reasonable class sizes, and have a good curriculum that focuses on both cognitive and social skills.  But if those elements of quality are present, pre-K programs can achieve significant short-run and long-run benefits both for former participants, and for our economy and society as a whole. 

Posted in Early childhood programs | 14 Comments

What do we know about Head Start’s effectiveness?

An ongoing policy dispute is about how effective Head Start is as a preschool program. Head Start has other goals, for example improving public health. However, an important issue is how Head Start’s effects on kindergarten readiness, K-12 test scores, and long-run educational attainment and earnings compare with the effects of other pre-K programs, or with no pre-K program at all.

I’ve written about Head Start’s effectiveness in previous blog posts. However, I wanted to provide some reflections based on some of the latest research, including the recently released 3rd grade results for the random assignment Head Start study, and the recent meta-analysis by Shager et al. of Head Start’s immediate cognitive effects. It seems to me that it is useful to compare these results with Deming’s study from a few years ago that examined the time pattern of Head Start’s results by comparing siblings who differed in Head Start participation.

What do we know about Head Start’s effects?

(1)    Head Start significantly improves cognitive test scores at the end of Head Start, or at kindergarten entrance, compared to no preschool at all.  The estimated effect size is on the order of 0.31 (see Shager et al.).  This “effect size” metric is common educational statistics jargon. It compares the effects of Head Start with a measure of variation in test scores (the test score standard deviation) across students at the end of Head Start or the beginning of kindergarten. This effect size means that Head Start, compared to no preschool, increases students’ cognitive test scores by about 40% greater than what they would have gained during the preschool year without Head Start or other preschool. (This calculation is based upon comparing students’ average learning with age independent of preschool attendance, as calculated in Bartik, Gormley, and Adelstein).

(2)    Head Start’s immediate effects are significantly less when compared with a control group for whom a sizable percentage attends other preschools. Shager et al. find that Head Start’s immediate average effects might be reduced by two thirds when a high percentage of the control group is active in other preschool programs.  This might occur because at least some of these other preschool programs have larger immediate effects than Head Start on cognitive test scores.  For example, Gormley et al. find that the Tulsa state-funded “universal” pre-K program, compared to Tulsa’s Head Start programs, has about twice the immediate effects on cognitive test scores at kindergarten entrance.

(3)    Head Start’s immediate cognitive test scores decay quite a bit over time when measured in effect size units.  However, studies differ somewhat on when this decay occurs, and its magnitude.  This pattern may reflect the timing of the study and what comparison is being made.

For example, the recent Head Start experiment examined Head Start’s effects for children who participated in Head Start in 2002-2003. These effects of Head Start were also relative to a control group that participated in other preschools, which may reduce both immediate effects and later effects. This study found that the immediate effect, at the end of preschool, of age 4 Head Start on three cognitive tests had an average effect size of 0.22. (This averages the “Impact on the Treated” effect sizes for the following 3 tests, which are consistently administered over time: the PPVT test, the WJ-III Letter-Word ID test, and the WJ-III Applied Problems test.) The end of third grade test scores effects on these same three tests had an average effect size of 0.06, over 70% lower, and none of these three test score effects was statistically significant at the end of third grade. 

Deming’s study of Head Start compared siblings who differentially participated in Head Start vs. no preschool. The rationale for this methodology is that although this Head Start participation is not “randomly” assigned, comparing siblings should hold constant many unobserved family factors. This study is examining the effects of Head Start participation that mostly occurred sometime during the 1980s. Deming finds that Head Start has initial effects of about a 0.15 effect size, compared to no preschool, in a sample of kids who mostly participated in Head Start in the 1980s. Unlike the recent Head Start experiment, these test score effects were mostly maintained when the former Head Start children were ages 7-10, declining to an effect size of 0.13. But the test score effects deteriorated at ages 11-14, to an effect size of only 0.06, and this small effect is statistically insignificant.

Why the difference in this time pattern of decay? That is hard to say. It could be due to the different methodologies, random assignment experiment versus comparing siblings with different participation in Head Start. However, Shager et al. do not find that the immediate effects of Head Start are lower in experimental studies vs. “quasi-experimental” studies such as Deming’s. And Deming’s study actually finds lower immediate effects of Head Start than the recent random assignment experiment.   

The differences could be due to the time period of the study, and the comparison group. Effects of Head Start may deteriorate more quickly in a later period when compared to a group for whom other preschool options have improved over time.

(4)    Even after Head Start’s test score effects have significantly decreased, Deming’s study suggests that long-run effects of Head Start may be large. By large, I mean that these effects suggest that Head Start easily passes a benefit-cost test.

For example, Deming finds that Head Start is estimated to have an effect size of 0.23 on an index of various age 19 or above measures of educational attainment, crime involvement, employment status, unwed teen parenthood, and health.  He predicts that this increase in his adult outcomes index would be expected to increase future wages by about 11%. This large predicted future effect on wages occurs even though the estimated effect of Head Start on test scores at ages 11-14 is small and statistically insignificant. The real rate of return to Head Start is calculated to be 7.9%, which compares very favorably to many social and private investments.

Would this “bounceback” of long-term effects also occur for participants in the current Head Start random assignment experiment? That’s hard to say, in part because the current experiment shows more immediate deterioration of Head Start’s test score effects, and because the current experiment is comparing Head Start to a control group that participated in other preschools.

We do know that we don’t need to have long-term effects of this magnitude for Head Start to pass a benefit-cost test. Deming calculates that at a 3% real discount rate, which is often used in social benefit-cost analyses, Head Start would pass a benefit-cost test even if adult effects were 70% lower than what he estimates.  Boosting adult earnings by even a few percentage points has large social benefits.

The bounceback of long-run effects, after early effects on test scores have significantly decreased, is not unusual in early childhood programs. For example, contrary to recent claims by Whitehurst,   the Perry Preschool program shows this pattern, although with a somewhat slower deterioration in test score effects. In the Perry study, performance on various standardized tests had declined to very low effect sizes by age 8. For example, the Perry treatment group, compared to the control group, showed an effect size at age 8 of only 0.06 on the PPVT test, after having an effect size of 0.91 after two years of preschool.  (See Table 3.3 of the Age 40 follow-up to Perry.)  (School achievement tests showed greater long-run effects in the Perry study, but these school-based achievement tests are not the sort of measure used in the Head Start experiment, probably because they are difficult to compile and compare across diverse schools.) Despite this deterioration in estimated standardized test score effects of the Perry program, the Perry study showed strong long-run benefits for adult outcomes.  

As another example, Chetty et al.’s recent analysis of the long-run effects of kindergarten class-quality show significant deterioration of test score effects, followed by a bounceback in effects on adult earnings.  Based on their research, the test score effects at 3rd grade of kindergarten class quality are around one-sixth the effects at the end of kindergarten.  (See their Figure VI). But kindergarten class quality ends up having large effects on adult earnings. The effects of kindergarten class quality on adult earnings can be more accurately predicted by kindergarten class quality’s effects on end-of-kindergarten test scores than by kindergarten class quality’s effects on test scores in subsequent grades.

This bounceback may be due to harder-to-measure effects on soft skills, as hypothesized by Heckman.  There is some support for this interpretation in Deming’s results. Deming finds sizable effects of Head Start on reducing grade retention and learning disability diagnoses, which might help explain Head Start’s long-term benefits. 

What should be made of all this? First, the Head Start estimates don’t indicate that “preschool doesn’t work”.  They might indicate that the average Head Start center’s performance may sometimes be disappointing when compared with some of the best state pre-K programs.

Second, the estimates do suggest that Head Start needs to improve its average performance if the goal is to rank with the best public preschools in effects on cognitive outcomes.

Third, the results suggest caution in generalizing from medium-run effects of Head Start to effects on adult outcomes for former participants. Effects that disappear at one age can reappear at other ages.  Sometimes the short-run effects of early childhood interventions are the best predictors of long-run effects.

Posted in Early childhood program design issues, Early childhood programs | 2 Comments

Fact-checking FactCheck on preschool

FactCheck’s recent column criticizing President Obama’s claims about his preschool program gave a misleading description of the overall research evidence on preschool.

FactCheck describes itself as “a project of the Annenberg Public Policy Center of the University of Pennsylvania”, and says that it is  “a nonpartisan, nonprofit “consumer advocate” for voters that aims to reduce the level of deception and confusion in U.S. politics.”

According to FactCheck, “In his State of the Union address, Obama exaggerated the effects of universal preschool by comparing results from small, expensive programs targeted to disadvantaged youth to a universal program for which such results are unproven.”

There are two major problems with FactCheck’s coverage.

First, FactCheck’s implication that there is no evidence of high benefit-cost ratios for programs of similar costs to large-scale state pre-K programs is incorrect.

FactCheck correctly points out that the Perry Preschool program, which shows high benefit-cost ratios, is “far different than any universal preschool program currently run by any state”, and in particular, that Perry’s $19,000 per student cost is far greater than most state’s preschool program.

However, FactCheck ignores that other preschool programs have very high benefit cost ratios yet are of similar cost to large-scale state preschool programs. For example, the Chicago Child-Parent Center program is of similar cost to these state programs, and has very high benefit-cost ratios, based on data collected on former participants up to their late 20s.

In Chicago, about half the students participated for one year at age 4 for a half-day program, and about half for two years at both ages 3 and 4. The benefit cost ratio was higher for the one-year program. The estimated benefit-cost ratio was 13.58 for the one-year version of the CPC program (see Table 5 of a 2011 paper by Arthur Reynolds and his colleagues).

One year of this program had a cost of $5,597, according to this same paper.  This is comparable to the $6,100 that FactCheck cites for average costs of one year of preschool in different states, based on a chapter by Duncan, Ludwig, and Magnuson. However, some of these figures may be for a mix of half-day and full-day programs.  The Chicago figure is greater than the $4403 per student spent in Tulsa, Oklahoma for a half-day program (Bartik, Gormley, and Adelstein).  But our Tulsa figure was in 2005-2006 prices, whereas the Chicago CPC figures are in 2007 prices. Also, Chicago prices are higher than Tulsa prices by about 14%.  If we adjust our Tulsa figures to “year 2007 Chicago prices”, Tulsa costs for a half day program would be $5,229 per student, which is not much below CPC’s costs.

In addition, the Chicago program was run at a large scale, similar to state programs, so it is not true that we do not have good research evidence for large-scale programs.

Second, FactCheck is incorrect to imply that there is no good evidence of high benefit-cost ratios for middle-class students.

They refer only to studies that “suggest the benefits [from preschool] that accrue to middle income students are far less dramatic”. They quoted Rich Neimand, of the Neimand Collaborative, as saying that “To the best of our knowledge, there has been no evidence-based scientific research on the value of universal preschool. However, that doesn’t necessarily mean that universal has no value.”

This ignores the research done on Tulsa by me and Bill Gormley and Shirley Adelstein. To the best of my knowledge, this is the only research on universal pre-K programs to compare preschool’s effect across low-income children and middle-class children using a well-regarded methodology (regression discontinuity).  Regression discontinuity is not quite as rigorous as random assignment, but it does provide good evidence.

The intuition is that we look for jumps in test scores comparing students who were just old enough to attend pre-K the previous year, and who are now entering kindergarten, with children who just missed the cut-off for 4-year-old pre-K the previous year, and who therefore are just entering pre-K. The argument is that these children are of almost the same age, and therefore should be similar in observed and unobserved characteristics. In fact, the data show the children are similar in characteristics that we can observe. The main difference is that one group of children has had a year of high-quality preschool, and the other group has not.

The limitation of our methodology is that we can only directly estimate the effects of universal preschool on kindergarten entrance scores for low-income and middle-class children. However, we use the available data on how kindergarten test scores predict adult earnings to do a partial benefit-cost analysis of universal pre-K. The benefit-cost analysis is partial in that we only look at predicted effects on future earnings. We ignore effects of preschool in reducing crime, which in many preschool studies are equal to or greater in dollar value than benefits in increasing earnings.

In our study, the “partial”, earnings-based benefit to cost ratio for half-day preschool for the lowest income children, who are eligible for a free lunch, is 4.08  — that is, each dollar invested in preschool increases the present value of these children’s future earnings by over four dollars. For more middle-class children, who are ineligible for any lunch subsidy, the earnings-based benefit to cost ratio is 3.44. This is below the benefit-cost ratio for low-income children, but not much below.

One could argue that the total social benefits to low-income children from Tulsa’s preschool program may be greater, and in fact we make that argument in our article. We point out that percentage effects on expected future earnings are greater for low-income children, which may mean that in some sense these dollar earnings gains are more socially valuable. In addition, it is possible that the anti-crime benefits of preschool may be greater for low-income children, although there is no direct evidence for this.

However, contrary to FactCheck, I would regard a benefit-cost ratio of 3.44 for earnings benefits for middle-class children as “dramatic benefits”. The estimated real “rate of return” to investing in preschool for middle-class children, based on earnings benefits alone, is 6.7%. This compares favorably with many other social investments and private investments.

It is fair to say that we have more long-term evidence from more rigorous studies for the benefits of preschool for low-income children.  But we do have some good evidence for a high rate of return to preschool programs for middle-class children.  And it’s fair to say that some of our evidence for high returns to preschool comes from small programs such as the Perry Preschool program. But we also have evidence from large-scale programs.

It is perhaps unreasonable to expect every social and educational program to have evidence from large-scale randomized experiments involving middle-class children. This is not the kind of research that government agencies or foundations tend to fund – such a large-scale program would be expensive, and these agencies tend to focus on needier children with their research dollars, for understandable reasons.

If the standard for proof for benefits for middle-class children is large-scale randomized experiments, we are rarely if ever going to have such evidence.  I sometimes wonder what would have happened to the “common school” movement of the 19th century if we had demanded that Horace Mann produce random assignment evidence from large-scale programs that the common school was needed for children of all income classes.

Posted in Distribution of benefits, Early childhood program design issues, Early childhood programs | 1 Comment

NPR’s spin on universal preschool is unduly negative

NPR’s Morning Edition on February 18, 2013 had an interview by NPR host Linda Wertheimer with NPR science correspondent Shankar Vedantam that gave an unduly negative spin to what research shows about the effectiveness of universal preschool.

The program began by quoting President Obama from his State of the Union address:

“Every dollar we invest in high-quality early childhood education can save more than seven dollars later on — by boosting graduation rates, reducing teen pregnancy, even reducing violent crime.  In states that make it a priority to educate our youngest children, like Georgia or Oklahoma, studies show students grow up more likely to read and do math at grade level, graduate high school, hold a job, form more stable families of their own.  We know this works.”

Linda Wertheimer went on to ask Shankar Vedantam the following:

“President Obama says that spending money in preschool gives us seven times our investment. Where does that number come from?”

Shankar Vedantam responded:

“Those numbers come from a couple of studies called the Perry Preschool program and the Abecedarian program …that targeted very high-quality and fairly expensive interventions at very disadvantaged children.”

My comment: I think there is also very good and more relevant long-term evidence of success from the Chicago Child-Parent Center program. This program is much cheaper than Perry or the Abecedarian program. The Chicago CPC program cost about $5400 for a one-year preschool program at age 4 that was shown to have a high benefit-cost ratio. Perry cost over $17,000 for the two-year program that was tested; the Abecedarian program cost around $40,000 for a five year program that was tested. (See my paper on Tulsa with Gormley and Adelstein for sources for these cost figures.)

Vedantam:

“ And what those programs found — they followed these children out not just for years but for decades – is that the programs didn’t have just cognitive benefits in other words improvements in performance in academic scores but they had life benefits, they … reduced the teen pregnancy rate, they reduced the crime rate, they had huge benefits later on. So the President is on very solid footing when he talks about the return on investment when it comes to those narrowly targeted programs. But when he rhetorically links those programs with larger programs, such as the experiences in states such as Oklahoma and Georgia, in some ways he’s venturing off the ledge of science.  There have been studies looking at the experience of those states, and I have not seen any data that suggests that return on investment in those states is anywhere close to seven times our investment. “

My comment: My study with Gormley and Adelstein found large effects of the Tulsa, Oklahoma program on kindergarten entrance exams for both low-income and middle-income students.  We used these short-term effects on test scores to project long-term effects that would increase the present value of earnings by three to four times the investment. This was true for all income groups of students.  In many of the studies that do have long-term data, the anti-crime benefits are often greater than the earnings benefits when calculated in dollar terms. Therefore, it would not be surprising to me if the Tulsa program ended up having a benefit cost ratio, when all benefits were included, of seven to one or more.  This is more likely for the low-income children in the Tulsa program, for which it seems reasonable to assume considerable anti-crime benefits. But even if middle-class children did not have any reduction at all in their crime rates, which is an extremely conservative assumption, we would have rates of return on investment of 3 to 1 for these children.

Vedantam:

“I want to emphasize that studies in the states have found that the programs do have benefits, they just don’t have benefits of the same magnitude as the highly targeted programs.  I spoke with Bill Gormley, he’s a researcher at Georgetown University. He said there were very clearly cognitive benefits among children in Tulsa, Oklahoma, who he studied in one of his studies.”

My comment: These cognitive benefits can be used to predict future “life benefits”. Studies such as those by Harvard Professor Raj Chetty and his colleagues show a connection between short-term cognitive benefits of educational interventions, and future life benefits such as increased adult earnings.  Applying these findings to Tulsa preschool requires some extrapolation, but not unreasonable extrapolation.

Vedantam:

 “Remember that the programs that targeted the highly disadvantaged children were highly focused, they focused on the children who were most in need, and they gave them the very best resources. When you ramp up a program and you make it universal and you make it state-wide, what happens is some of the children who end up using the program are children who don’t really need the program that much, they would have turned out fine anyway.”

My comment: It is also possible that some highly disadvantaged children are so disadvantaged that the preschool intervention is insufficient to make a difference. On the other hand, some middle-class children may have more limited kindergarten readiness issues with social skills that can be effectively addressed by preschool.  Therefore, theoretically it’s hard to tell who will benefit the most from high-quality preschool.

Vedantam:

“And the second thing that happens [when you go from small targeted programs to universal, statewide programs], unless your budget expands astronomically, is that the quality of the programs tends to go down. “

My comment: Quality does cost money. But the cost is not astronomical.  We can get results with a half-day school-year program for 4-year-olds that costs about $5,000 per child. (See the Institute for Women’s Policy Research study entitled “Meaningful Investments in Pre-K”.)   If we project out these costs to the roughly 3 million additional 4-year olds we would need to enroll in high-quality preschool to get to “universal” preschool, the additional cost is about $15 billion per year. (See chapter 4 of my book Investing in Kids. The Center for American Progress has a more expansive program for three and four year olds that appears to have a full implementation cost of about $25 billion per year.)  $15 billion is about $50 per capita. This is an amount that is affordable by either the federal government, or by state governments on their own. We can see that it is affordable in practice, because states such as Oklahoma already do fund high-quality preschool as an entitlement for all four-year-olds.

Vedantam then responds to a question about Head Start:

“I think Head Start has done wonderful things, but in terms of the return on investment and how effective it is at boosting scores over the long-term, there are significant questions that have been raised. There has been a Congressionally mandated study that just came out a couple  months ago that found that even though the programs in general were pretty good, by the time children reached 3rd grade, there was really no difference between the children who had been through the Head Start program and those who had not.”

My comment: This misses that the research on the impacts of Head Start is quite mixed. There are good studies that show long-run effects of Head Start. I have reviewed this in detail in previous blog posts.

Vedantam:

“In some ways I think that the issue of Head Start raises the larger question which is there’s a tension here… between what’s politically popular,  which is parents love these programs, they love subsidized child care, they love preschool, and lots of people want to sign up for them. Scientifically, though, what seems to have the biggest benefit is when you target your limited dollars at the people most in need… How you square the popularity of a program that reaches all families versus the benefits of a program that [is] more narrowly targeted, I don’t know how you square that politically.”

My comment: I think this is an unfortunate wrap-up that may be attributable to this being an unscripted interview. Head Start is targeted. It is arguable that part of the problem with Head Start is because it is so targeted on the poor, there is insufficient political pressure to continuously improve the quality of the program.  The notion that more targeted programs will result in more resources for low-income children may not be true, because such programs lack as much political support.

In addition, none of Mr. Vedantam’s comments address possible peer effects in preschool. Is it really optimal to put low-income kids in income-segregated preschool classrooms? This is another issue to consider in analyzing the pros and cons of targeted versus universal programs.

In sum, I don’t think that Mr. Vedantam’s report provides a balanced assessment of the research evidence for targeted versus universal preschool. It is fair to say that there is more long-term evidence from random assignment studies of preschool’s benefits for low-income students. We don’t have long-term random assignment studies of preschool’s effects on middle-income students.  But as the cliché goes, absence of evidence is not evidence of absence. And we have good evidence that large-scale universal state preschool programs produce short-term benefits for both low-income and middle-class children that are consistent with sizable long-term benefits. The costs of implementing such universal programs are not astronomical, and there are some political as well as substantive advantages to having income-integrated programs.

Posted in Distribution of benefits, Early childhood program design issues, Early childhood programs | 1 Comment

Research supports the effectiveness of many state and local pre-K programs

In the debate over President Obama’s pre-K proposal, one important issue is whether pre-K programs can work on a large-scale, not just in small “hothouse” programs run by researchers.  A closely-related issue is whether pre-K works for middle-class children, or just for low-income children.

In this policy debate, one recent study that has received some attention is a short piece by Russ Whitehurst, director of the Brown Center on Education Policy at the Brookings Institution, and former director of the Institute for Education Sciences within the U.S. Department of Education. Dr. Whitehurst focuses on the evidence from research on state pre-K programs. He argues that this state-focused research only provides weak evidence for the efficacy of pre-K:  he characterizes the studies as “thin empirical gruel”.

Dr. Whitehurst’s main argument is that the state pre-K programs have not been evaluated using random assignment experiments. He argues that there are various methodological problems with the current research studies on state pre-K program.

In an ideal world for researchers, we would want as much evidence as possible from studies with random assignment of children to either receive or be denied pre-K services. With such random assignment, if implemented perfectly, the children who receive pre-K services, versus children who do not, will on average be the same in both observed and unobserved characteristics. Any differences in outcomes of children in the pre-K vs. non-pre-K group will then be plausibly due to the pre-K program, not what types of families select pre-K services or how the pre-K program selects children.  However, in the real world, it is not always possible to perfectly implement random assignment experiments for all programs in all places, and we need to look to other methodologies that will provide useful research evidence.  For example, if not all participants in the random assignment experiment have useful data, it is possible that differential attrition in the two groups, children who did and didn’t receive pre-K services, may lead to differences in unobserved characteristics that can bias estimated program effects.

In my opinion, Dr. Whitehurst significantly understates the quantity and quality of the existing research evidence for the effectiveness of large-scale state and local pre-K programs.  Although these research studies do not typically use random assignment, they often rely on “natural experiments”.  In these “natural experiments”, the assignment of children to receive pre-K program services either varies with factors that are plausibly unrelated to unobserved characteristics of children, or we do a good job of controlling for children’s characteristics.

These “natural experiments”   provide evidence that is almost as reliable as the evidence from real-world random assignment experiments, which have their own imperfections. A perfectly implemented random assignment experiment may be the “gold standard” for research credibility. But a well-done study of a natural experiment can provide “silver standard” evidence.  When such “silver standard” evidence consistently points to the effectiveness of large-scale pre-K programs, the quantity and quality of such evidence provides a sufficient basis for policymakers to implement large-scale pre-K programs with reasonable confidence of success.

Dr. Whitehurst’s review of the large-scale pre-K studies overlooks many studies that provide relevant evidence. Most importantly, he omits evidence from the Chicago Child-Parent Center program. Studies of this program rely on the difference in access to the program’s pre-K services in similar low-income neighborhoods in Chicago – a “natural experiment”.  A series of studies of this program have found evidence for effects of the Child-Parent Center Program in reducing crime, and in increasing educational attainment and earnings. Dr. Whitehurst also omits favorable evidence for the effectiveness of various large-scale state pre-K programs, for example in Arkansas, New Jersey, and Tennessee. The Arkansas and New Jersey studies try to control for observable characteristics in finding comparison groups. The Tennessee study used random assignment experiment, the “gold standard”, but with considerable problems in missing data for some study participants.

Dr. Whitehurst is also too quick to dismiss the many studies that use a technique called “regression discontinuity” to evaluate pre-K programs’ effects on kindergarten readiness. In these studies, the researcher uses information on two groups of students, one group that has just started the pre-K program and another group that has just started kindergarten after completing the pre-K program. The two groups take the same assessment tests.  The research methodology is to examine the data to see if there is a large “jump” in test scores, above what would be predicted due to aging, between students who were a few days too young to enter pre-K the previous year, and are therefore just starting pre-K, and other students who just made the age cut-off for taking pre-K the previous year, and who therefore are starting kindergarten this year. The argument for this regression discontinuity methodology for studying pre-K is that these children, who are only apart by a little bit in age, should be similar in all observable and unobservable characteristics. As a result, any large jump in test scores at the age cutoff can be plausibly attributed to the pre-K program.

These “regression discontinuity” studies of state pre-K programs often find sizable effects of large-scale state pre-K programs in improving kindergarten readiness. These studies also find that pre-K has large benefits for middle-class children, not just low-income children.

Dr. Whitehurst’s argument against this regression discontinuity approach is that the test score differences could be due not to the pre-K program, but to differences in how parents treat children who are only a few days apart in age. The argument is that parents who know that their son or daughter will go to kindergarten next year,, compared to parents whose son or daughter is a few days younger, and therefore will not be going to kindergarten next year, will either work harder to make their child ready for kindergarten, or will do more to expose their child to older playmates.

Dr. Whitehurst’s argument is theoretically possible. But is it really empirically plausible that such large test score effects will occur due to such possible differential parent behavior for children only a few days apart in age?  I doubt that any such differences are even close to sufficient to account for the large test score effects we see.

Furthermore, the regression discontinuity studies consistently find test score effects on math test scores, not just vocabulary and literacy test scores. Math achievement is more related than literacy achievement and especially vocabulary achievement to school factors compared to home factors. If Whitehurst’s hypothesis was correct, we might expect the regression discontinuity methodology to find significant effects on vocabulary but not on math. This is not what we find.

Finally, several of the regression discontinuity studies complement their regression discontinuity results with evidence using good comparison groups.  The New Jersey , Arkansas, and Oklahoma studies use observable characteristics in trying to compare pre-K participants with non-participants. The Tennessee study uses random assignment to compare pre-K participants with non-participants.  These studies find that both regression discontinuity studies and other types of methodologies find positive effects of state pre-K programs on kindergarten readiness.

In sum, we have a large number of studies that point to the effectiveness of large-scale pre-K programs in improving outcomes for children. Although much of the evidence is not from random assignment experimentation, it is from studies with reasonable methodologies.  The evidence is consistent with smaller-scale studies that do use random assignment. Given the difficulties in implementing random assignment on a large scale, and the large amount of money and time it would take to do so, it is unreasonable to demand that every aspect of pre-K policy be backed by a random assignment experiment.

Children are only 4 once. If we delay policy innovation to wait around for more and more long-term random assignment studies to be done, then there is a potential tremendous opportunity cost in not providing pre-K services to many cohorts of children.

Posted in Early childhood programs | 1 Comment

We have experience and research on scaling up quality pre-K

President Obama’s proposal for federal support for moving to universal pre-K for 4-year-olds will no doubt be fleshed out in the near future. As always, the devil is in the details.

However, I want to respond to one idea that seems to persist among many pundits/bloggers/commenters. This is the idea that we have no research evidence on scaling up quality preschool. I saw this in a blog post by Matt Yglesias at Slate. And a tweet by Mike Petrilli (Thomas B. Fordham Institute) also expressed similar sentiments.

I have commented on this extensively in several previous blog posts, because this idea has been around for a while. This idea that we have no experience in scaling up quality pre-K simply isn’t true. First, we have evidence from the Chicago Child-Parent Center program, which was a large scale program run by Chicago Public Schools, that this program has large long-run effects on former participants when they are in their late 20s. Second, we have extensive research evidence from many states with large scale pre-K programs that these programs are highly effective in increasing kindergarten readiness.  Third, for Oklahoma, the state that is the poster child for universal pre-K, we have good research evidence from several studies that this program helps improve kindergarten readiness, not only for low-income children, but also for middle-class children.

I suppose someone could argue that these state programs’ effects on kindergarten readiness do not prove long-run effects. But these early effects are often of similar size to those achieved in the CPC program, and many of these state programs are similar in design to the CPC program. Therefore, it is a reasonable inference that these large-scale state programs will have long-run effects.

I’m a researcher. I would always like to see more research evidence. But in the case of preschool, we really do have a great deal of evidence already that large-scale programs can work.

Posted in Early childhood program design issues, Early childhood programs | Comments Off on We have experience and research on scaling up quality pre-K