In response to a reader request, I am taking a closer look at a recent article by Charles Murray, entitled “The Shaky Science Behind Obama’s Universal Pre-K”. The article was published on February 20, 2013 by Bloomberg News. Charles Murray is a well-known political scientist who is a scholar at the American Enterprise Institute, and has written widely on many topics, including his argument that many of the problems of poverty have to do with breakdowns in the American family.
Murray’s main point of the article is stated upfront:
“There are just two problems with [Obama’s proposal for expanded pre-K]: The evidence used to support the positive long-term effects of early childhood education is tenuous, even for the most intensive interventions. And for the kind of intervention that can be implemented on a national scale, the evidence is zero.”
My comment: This statement is incorrect, as I will demonstrate below. There is extensive evidence for expanded pre-K education, even for universal pre-K, from studies with large sample sizes of programs that have already been implemented on a large-scale.
“The main problem is the small size of the samples [for these two programs]. .. Another problem is that the evaluations of both Perry Preschool and Abecedarian were overseen by the same committed, well-intentioned people who conducted the demonstration projects.”
My comment: Murray chooses to focus on two programs with small sample sizes and with evaluations run by the researchers who set up the program. Murray overlooks preschool programs that have evaluations with large sample sizes that were conducted by outside researchers.
For example, if we want evidence for preschool’s long-run effects, we can look at the many evaluations done of the Chicago Child-Parent Center program. This program’s evaluations rely on sample sizes of over 1400 children, over ten times the sample size of Perry or Abecedarian. And these CPC evaluations were done by outside researchers. These evaluations of CPC have found strong long-run benefits, with an estimated benefit-cost ratio of over 10 to 1.
If we want to look at evidence for preschool’s short-run or medium-run effects, we have many studies with large sample sizes conducted by outside researchers. These include many studies of state pre-K programs conducted by researchers at the National Institute for Early Education Research. These also include studies of Tennessee’s pre-K program conducted by researchers at Vanderbilt, and studies of North Carolina’s program conducted by researchers at Duke. Finally, Bill Gormley and his colleagues have done a series of studies of the effects of Tulsa’s pre-K program. All of these studies have found significant benefits of high-quality pre-K programs.
These studies typically look at short-term or medium-term effects of pre-K. However, they do project long-term benefits based on the expected relationships between short-run test score gains and long-term effects on adult outcomes. For example, my recent study of Tulsa with Gormley and Adelstein projected that per dollar invested in pre-K, the present value of earnings would increase by $3 or $4. These large benefit-cost ratios held for both half-day and full-day pre-K programs at age 4, and for both low-income and middle-class kids. This study relied on a large sample size of over 2500 children, which is over 20 times the sample size of Perry or Abecedarian. And none of us researchers have anything to do with designing or running Tulsa’s pre-K program.
Murray gets quite detailed about his concerns with the small sample size of Perry and Abecedarian:
“The main problem is the small size of the samples. Treatment and control groups work best when the numbers are large enough that idiosyncrasies in the randomization process even out. When you’re dealing with small samples, even small disparities in the treatment and control groups can have large effects on the results. There are reasons to worry that such disparities existed in both programs.”
My comment: What Murray overlooks is that common statistical procedures incorporate the imprecision from randomness with small sample sizes by making the confidence intervals for any estimated effects much larger. As Nobel prize-winning economist James Heckman has pointed out, the small sample size and the resulting large confidence intervals mean that we have to have very large effects in Perry and Abecedarian to have any statistically significant results:
“Charles Murray has made that claim [about small sample size] most recently, and others make it too… [But] a small sample would actually work toward not finding anything. You have a limited number of observations. You would argue that the statistical observations would not be very great, and there would not be much of them. There are methods that account for the small sample size. Size doesn’t matter. It holds up. There’s a lot of robustness here…”
Furthermore, if one is worried about “insiders” doing the research, or about problems with the randomization process, it should be reassuring that Heckman, a prominent “outside researcher”, has reanalyzed the data from Perry and found that the results from the original research hold up, even after we account for some problems in the initial randomization process. Heckman won his Nobel Prize in large part due to his research in how to overcome “selection bias” in evaluating the effects of public policies.
But Murray states that his main reason for thinking that Perry and Abecedarian only provide tenuous evidence is that he believes that they failed in a replication with a larger sample size:
“The most concrete reason for doubting the wider applicability of the Perry Preschool and Abecedarian effects is this: A large-scale, high-quality replication of the Abecedarian approach failed to achieve much of anything. Called the Infant Health and Development Program, it was begun in 1985. Like Abecedarian, IHDP identified infants at risk of developmental problems because of low birth weight and supplied similarly intensive intervention. Unlike Abecedarian, IHDP had a large sample (377 in the treatment group, 608 in the control group) spread over several sites assessed by independent researchers. IHDP provided a level of early intervention that couldn’t possibly be replicated nationwide, but it gave us by far the most thorough test of intensive early intervention to date.”
My comment: This is a strange critique of Obama’s proposal for expanding state-funded pre-K at age 4. The IHDP provided home visits from birth to age 3, and provide high-quality child care/preschool at ages 1 and 2. However, the program did not provide preschool at ages 3 or 4, so it is hard to see how it is particularly relevant to a proposal to expand preschool at age 4.
Furthermore, the IHDP differed significantly from Abecedarian in many respects, including that Abecedarian included full-time child care and preschool from birth until the children were age 5. In addition, Abecedarian was targeted at high-risk children, whereas the IHDP was targeted at low-birth-weight children. Although IHDP used the Abecedarian curriculum in child care, the rest of the program was quite different, and it had a very different target group, so it is hardly a close replication of Abecedarian.
In addition, Murray’s negative spin on the effects of IHDP are not shared by the research he cites on the program. As of the age 18 follow-up, these researchers conclude that
“The results of this phase of the IHDP suggest a persistent benefit of the intervention for the subset of HLBW [heavier low-birth-weight] participants and absence or even reversal of any intervention effect for the youth born weighing less than or equal to 2000 g.”
In other words, the program seemed to have statistically significant positive effects on test scores at age 18 for the low-birth-weight participants who were closer to normal birth weights, and therefore more similar to the bulk of the Abecedarian sample. The researchers went on to suggest that the lack of an effect of the program in the “Lighter Low-Birth-Weight” (LLBW) group (less than 2000 g) might be due to less participation by very low-birth-weight participants in the center-based child care/preschool program at ages 1 and 2.
In addition, the researchers note that as of age 18, they can’t really analyze educational outcomes, unlike other studies. It also would be impossible at age 18 to directly estimate long-run earnings effects.
Furthermore, they note that in the HLBW groups, the point estimates for benefits in reducing special education costs are similar to the Chicago Child-Parent Center program, although because the HLBW group is less than half the overall sample, the estimates are imprecisely estimated and are not statistically significant. In the Chicago CPC program, these benefits in reducing special education costs are over $5,000 per participant.
The same statistical insignificance occurs for anti-crime effects for the HLBW group in IHDP, although the point estimates for reducing crime are about half those in the CPC study. In the Chicago CPC study, the anti-crime benefits alone had a present value of over $40,000 per participant, so the point estimates in IHDP also point to very large anti-crime benefits, although they are inconclusive because of low sample size.
In other words, it is fair to say that IHDP finds no evidence of long-run benefits for former child participants who started out as “lighter” low-birth weight infants. But the program does find benefits for heavier low-birth-weight infants. But for this group, the study runs into sample size problems which make it difficult to provide statistically significant estimates for some effects even when the point estimates are consistent with large benefits.
Finally, if we are going to evaluate early childhood programs in part for what they do for parents, IHDP does show significant effects in boosting maternal employment. When former child participant are age 18, over 15 years after IHDP stopped providing child care services, the mothers in the program group are significantly more likely to be employed. These effects at age 18 are only statistically significant in the lighter low-birth-weight group, for whom the effect is to boost employment rates when their child is age 18 from 73% to 86%, which is quite sizable. In my examination of the benefits of the Abecedarian program in boosting state residents’ earnings per capita, I found that more than half the earnings benefits of the program came from effects in boosting parents’ earnings short-term and long-term. The Abecedarian program could pass a benefit-cost test based solely on effects on parental earnings.
Murray does concede that early education programs can work:
“The disappointing results from the IHDP don’t mean that early education can’t do any good. Other studies of good technical quality have convinced me that the best early education programs sometimes have positive long-term effects, though much more modest than the ones ascribed to Perry Preschool and Abecedarian.”
My comment: I agree that other preschool programs probably have smaller long-term effects than Perry and Abecedarian. However, “much more modest” seems a bit of an over-statement. Adult earnings effects for former child participants are about 19% for Perry and about 14% for Abecedarian (see Bartik, Gormley, and Adelstein for sources for these calculations). But adult earnings effects for the Chicago Child-Parent Center are around 7%. And projected adult earnings effects for Tulsa for “free lunch” children are 7% for a half-day program at age 4, and 10% for a full-day program at age 4. Increasing average earnings by 7 to 10% is more than a modest effect.
Furthermore, benefit-cost ratios are not necessarily lower for programs other than Perry and Abecedarian. Perry cost over $17,000 per participant, and Abecedarian cost almost $40,000 per participant, compared to a little over $5,000 per year per participant for the Chicago Child-Parent Center program, and around $4500/$9,000 for a Tulsa half-day/full-day program. (These figures are in 2005-2006 prices, and come from Bartik, Gormley, and Adelstein. The CPC figures are for a one-year program, which was the pattern for 55% of the study participants, and the one-year participants had a higher benefit-cost ratio.) So programs that invest less get lower percentage earnings effects, which is not surprising. In my calculations of effects on state residents’ earnings per capita, a universal pre-K program modeled after CPC, and similar to Tulsa, has a higher benefit-cost ratio than the Abecedarian program.
But Murray goes on to claim that the best early education programs are not scalable:
“That leaves us with one last problem: None of those first-rate programs are replicable on a large scale. The kind of nationwide expansion of early education that Obama wants won’t have the highly motivated administrators and hand-picked staffs that demonstration projects enjoy, and the per-child cost of the interventions on the Perry Preschool and Abecedarian model are prohibitively high. If you’re going to have a national program, you’re going to get the kind of early education that Head Start provides.”
My comment: Murray doesn’t say what “other studies” he’s including beyond Perry and Abecedarian. However, this statement ignores that many “first-rate programs” that have been evaluated have already been implemented on a large-scale, without “hand-picked” administrators and staff. This includes the Chicago program, as well as the various state programs, such as the Oklahoma program that funds Tulsa’s program. If we’re going to have a national pre-K program for 4-year olds that is primarily focused on kindergarten readiness in terms of both cognitive skills and social skills, we can choose to model that program after these large-scale successful state and city programs.
Furthermore, these large-scale programs have less than one-third the cost of Perry and perhaps one-eighth the cost of Abecedarian (see cost figures above). These costs are not prohibitively high. At about $5,000 per participant per year, I have estimated that a high-quality half-day pre-K program for 4-year-olds that was universal might cost $14 billion annually. This is around $50 per U.S. resident, which is affordable either for the federal government or for state governments.
In other words, a national program need not be modeled after Head Start in design or costs, but rather can follow these successful and affordable state and city models for pre-K services.
Murray then goes on to summarize the recent third-grade follow-up results of the national Head Start experiment:
“Of the 47 outcome measures reported separately for the 3- year-old and 4-year-old cohorts that were selected for the treatment group, 94 separate results in all, only six of them showed a statistically significant difference between the treatment and control group at the .05 level of probability — just a little more than the number you would expect to occur by chance. The evaluators, recognizing this, applied a statistical test that guards against such “false discoveries.” Out of the 94 measures, just two survived that test, one positive and one negative.”
My comment: I’ve already commented extensively on Head Start in several blog posts. Without repeating all that analysis in full detail, there are two things that this summary overlooks:
First, the Head Start study is implicitly comparing effects of Head Start with the effects of whatever activities were engaged in by the control group. This included preschool. According to the latest report, “Approximately 60 percent of the control group children participated in child care or early education programs during the first year of the study, with 13.8 percent of the 4-year-olds in the control group and 17.8 percent of the 3-year-olds in the control group finding their way into Head Start during the year.”
If some of these alternative preschool programs are highly effective state or local pre-K programs, this may significantly reduce any net Head Start effect. However, such a lower net Head Start effect does not imply that preschool doesn’t work compared to no preschool.
Second, this summary ignores that some good Head Start studies have found significant fade out of test score effects of Head Start, followed by a bounceback of benefits at older ages and adulthood. For example, Deming’s study of Head Start found that initial effects of Head Start on test scores at ages 5 and 6 faded by 60% by ages 11-14. But these effects were still consistent with much larger effects on adult outcomes, which would predict adult earnings effects of Head Start of about 11%.
Murray goes on to make a somewhat puzzling emphasis on one aspect of the Head Start study:
“One aspect of the Head Start study deserves elaboration. The results I gave refer to the sample of children who were selected to be part of the treatment group. But 15 percent of the 3-year-old cohort and 20 percent of the 4-year-old cohort were no-shows — a provocative finding in itself. When the analysis is limited to children who actually participated in Head Start, some of those outcomes do become statistically significant, though still substantively small. But keep in mind that we’re looking at selection artifacts: Children who end up coming to the program every day have cognitive, emotional or parental assets going for them that children who fail to participate don’t have. This means that if somehow the no-shows could be forced to attend, you couldn’t expect them to get the same benefit as those who participated voluntarily. If you’re asking what impact we could expect by making Head Start available to all the nation’s children who might need it, you have to make the calculation based on giving access to the service.”
My comment: Murray’s discussion here is puzzling. We can adjust the Head Start estimates from what are called “Intent to Treat” (ITT) estimates to “Impact on the Treated” (IoT) estimates. This basically divides the ITT estimates by the difference in the proportion participating in Head Start in the treatment group vs. the control group. This involves blowing up the estimates by about 40 to 50%. For example, for the 4-year-old cohort, 80% of the treatment group ended up participating in Head Start, vs. 14% of the control group. The difference is 66% in participation in Head Start. We assume that the ITT estimates are solely due to this participation difference, and we therefore divide the ITT estimate for 4-year olds by 0.66 to get the effect of going from no Head Start to Head Start participation.
But contrary to Murray, this has no implications for statistical significance. It simply makes both the estimated effects and standard errors of those effects larger by some percentage. This is noted in the Head Start report on page 89: “There is no change in the statistical significance of the estimates.”
Murray is right that “Impact on the Treated” estimates reflect effects for people who choose to participate in Head Start, and may not translate into effects on children forced to participate in Head Start. But it is unclear what relevance this would have to some hypothetical program that would expand voluntary access to Head Start. No one is proposing mandatory preschool.
Murray then summarizes his case as follows:
“The take-away from the story of early childhood education is that the very best programs probably do a modest amount of good in the long run, while the early education program that can feasibly be deployed on a national scale, Head Start, has never proved long-term results in half a century of existence.”
My comment: As stated above, there are many proven large-scale pre-K programs that are not Head Start and that show much more than modest benefits in the long-term.
In addition, Murray overlooks the many rigorous Head Start studies that show long-term benefits, including studies from Deming, Ludwig/Miller, and Garcia/Thomas/Currie. I’ve discussed this evidence in previous blog posts.
Murray might respond that these other studies are not random assignment experiments. But they have very good comparison groups for Head Start participants. By “good comparison groups”, I mean that the non-participants in Head Start are likely to be quite similar in observed and unobserved characteristics to the Head Start participants. Deming and Garcia/Thomas/Currie compare siblings who differ in Head Start participation. Ludwig/Miller compare counties that differed in whether they received help from the federal government in preparing their Head Start application back in the 1960s, based on whether the county was below or above some poverty threshold for such assistance. These are rigorous methodologies.
While random assignment would be ideal in a world with infinite resources and time, random assignment is expensive and cumbersome, and by definition takes a long time to get long-term results. We should not throw away the results of other rigorous studies just because they lack random assignment.
Murray then goes on to summarize his case more bluntly:
“Let me rephrase this more starkly: As of 2013, no one knows how to use government programs to provide large numbers of small children who are not flourishing with what they need. It’s not a matter of money. We just don’t know how.”
My comment: For all the reasons outlined above, I think this is incorrect. Many state and local areas are already implementing large-scale pre-K programs that have good evidence for both short-run and long-run benefits.
Your average American state government, or local school district, can successfully carry out large-scale preschool programs. To do so, that state or local government agency must be willing to spend a reasonable amount per student, have well-trained and paid teachers, have reasonable class sizes, and have a good curriculum that focuses on both cognitive and social skills. But if those elements of quality are present, pre-K programs can achieve significant short-run and long-run benefits both for former participants, and for our economy and society as a whole.