Experimental vs. non-experimental evidence on early childhood programs

A recent evaluation of Tennessee’s pre-K program found very mixed results. The report was done by Vanderbilt researchers Mark Lipsey, Kerry Hofer, Nianbo Dong, Dale Farran, and Carol Bilbrey. The program’s initial effects on academic achievement at the end of pre-K seemed to mostly disappear in a year or two. The program did, however, appear to have some evidence of reducing the percentage of students retained in kindergarten. Because student retention reflects behavior as well as academic achievement, and also independently predicts future success, we would expect these kindergarten retention effects to potentially have long-lasting effects on pre-K participants as they age.

This report has already been used by opponents of pre-K. Tennessee state representative Bill Dunn argued that this report shows that benefit claims for pre-K are “hype”. According to Rep. Dunn,

“If you do a cost-benefit analysis on this extremely expensive program, you will come to the conclusion that it is like paying $1,000 for a McDonald’s hamburger. It may make an initial dent on your hunger, but it doesn’t last long and you soon realize you could have done a lot more with the money spent.”

Steve Barnett, Director of the National Institute for Early Education Research, has already provided some useful analysis of the Tennessee study. As Professor Barnett points out, Tennessee’s program spends far less than would be needed to ensure adequate quality. Program spending per child for a full-day school-year program is $5,814. NIEER estimates, based on studies by the Institute for Women’s Policy Research, that a quality full-day pre-K program in Tennessee would cost $8,059, 39% more than what Tennessee spends.    (For the IWPR research, see “Meaningful Investments in Pre-K”; for NIEER’s estimates, see Table 7 in the Executive Summary of the latest NIEER report on state pre-K programs.)

As Barnett points out, even the initial estimated effects of Tennessee’s pre-K program were smaller than in some other programs with higher spending per child. If initial effects are small, then these initial effects can be more easily offset by children’s K-12 experiences. In particular, we expect that teachers will try to intervene with kindergartners and first-graders to offset disadvantages, which will tend to help children who were behind, whether due to a lack of pre-K or other causes, to catch up to other students.  These offsetting interventions tend to reduce the measured effect of pre-K on student achievement. They also obscure one benefit of pre-K, which is that it may reduce the need for such remedial intervention, which inevitably come at some opportunity cost in teachers’ time and attention.

The main point I would add to Barnett’s analysis is that the Tennessee report illustrates the enormous difficulties of conducting a true “gold standard” random assignment experiment. One aspect of the report that will no doubt be cited by opponents is that the Tennessee study involved random assignment to treatment and control groups. Random assignment is argued to give more reliable results than other type of evaluation. And it is true that if there are no issues with data collection and implementation, and if the sample is large enough, we can be assured that random assignment experiments will reveal the true causal effects of a program. The treatment and control group will on average be expected to be identical in observed and unobserved characteristics. If the sample is large enough, these expectations become increasingly likely to be realized. Therefore, any differences that are observed must be due to the program intervention.

In the real world, it is quite difficult to actually carry out and implement a perfect random assignment study. Inevitably, problems crop up. For the Tennessee study, the academic achievement results so far are based on a relatively modest sub-sample of the treatment and control group for which parents  agreed to answer questions and have tests administered to their children. The first cohort had only 46% of the pre-K participants agreeing to such data collection, and a much smaller 32% for non-participants.  These small data consent rates are troubling, and the differential across treatment versus control groups is also troubling. The participation rate increased in the second cohort to 74% for participants and 68% for non-participants, which is still much lower and more differential that we would prefer.

The problem is that even if the original sample is randomly divided into treatment and control groups, and that therefore the two groups are expected on average to be similar in unobserved characteristics, there is no reason to think that this is true of the smaller sample that agreed to data collection. The “gold standard” nature of the evidence is weakened or perhaps even eliminated.

The Vanderbilt researchers try to do what they can to match treatment and control group members on observables. This is what is commonly done in any non-experimental study. But such matching can only control for observable differences between treatment and control groups. Unobservable differences may still remain.  The Tennessee study probably suffers from this issue at least as much as the average non-experimental study

Other problems with this experiment are due to crossover between treatment vs. control group assignment and what pre-K services are actually received. According to the Vanderbilt research report, about 19% of the sample ended up not following their random assignment – either they were randomly assigned to pre-K, but ended up not participating, or they were randomly assigned to the control group, but somehow succeeded in enrolling in the state pre-K program.

Most of the Tennessee report’s analysis is based on comparing students who actually participated in pre-K versus those who did not participate. Again, this is no longer a pure random assignment experiment, as there are many reasons why the 19% of the sample who broke with their assignment might differ from other families.

Finally, a complexity of the Tennessee study is that it is not really a single experiment. Rather, it an average of separate experiments in 58 different pre-K programs across the state. In each of these pre-K programs, applicants exceeded slots available, so a random assignment lottery was run to determine enrollment.  But the result is that the relative size of the treatment versus control group varies systematically across pre-K programs. The study corrects for this problem by weighting the treatment and control group separately so that the weighted population for each group has the same probability of coming from each of the local pre-K programs. This is one way to deal with this problem, but results might differ for other ways. (For example, one could do a model where treatment effects were separately estimated by pre-K program, and then weighted up in various ways.)

None of this means to say that the Tennessee result provide no useful information. The study’s researchers worked hard to provide useful information despite the limitations of their data that reduced the methodological advantages of the initial random assignment.  I regard the results as providing useful information that increases my belief that the initial academic test score effects of Tennessee’s pre-K program are probably to a large extent offset in early elementary grades by the K-12 system’s intervention. The jury is still out on whether the more behavioral effects of Tennessee’s program on student retention will translate into large effects on high school graduation rates or other long-term behaviors.

But I don’t think that the Tennessee study’s results should be given greater weight than a well-done study with a non-randomly chosen control group. For example, I suspect that the Head Start studies that compare siblings who do and do not participate in Head Start are using at least as reliable a method for determining causal effects of pre-K as the Tennessee study. The final sample that is analyzed in the Tennessee study is very far from being a sample in which one can be at all confident that the treatment and control group are the same in unobserved characteristics.  Therefore, it is unclear whether the estimated differences between the treatment and control groups represent the true effects of the program.

There is no magic pre-K evaluation methodology that trumps all other methodologies. In the real world, determining what is the true effect of a program is difficult. There are better and worse methodologies for doing so, and better and worse datasets. Random assignment studies are rarely even close to perfect, and other methodologies can also do a good job of providing comparison groups that are truly comparable.  Our evaluation of the effects of pre-K programs should be based on the totality of the evidence from multiple studies with diverse data and methodologies, rather than solely on any one study that happens to incorporate random assignment.

Posted in Early childhood program design issues, Early childhood programs | Comments Off on Experimental vs. non-experimental evidence on early childhood programs

The effectiveness of many state and local pre-K programs has been backed by sound research

Russ Whitehurst of the Brookings Institution has still another blog post attacking President Obama’s preschool proposal. (I have previously responded to three previous blog posts on this topic by Whitehurst.)

The most recent post by Whitehurst, co-authored with David Armor, was posted on July 24, 2013. The post is entitled “Obama’s Preschool Proposal is Not Based on Sound Research”.

The Whitehurst-Armor argument can be summarized as follows:  Head Start does not have large enough test score effects, states with extensive pre-K programs do not have large enough boosts in 4th grade test scores, and existing studies of state and local pre-K programs are methodologically deficient.

My counter-argument can be summarized as follows: Head Start has shown large long-run effects even with test-score fading; even if we think Head Start effects are too small, Head Start evidence is of limited relevance to the Obama proposal; even boosts in 4th grade test scores that appear modest can have economic and social benefits that far exceed pre-K costs; there are many methodologically sound studies of state and local pre-K programs that show large short-run and long-run effects.

In other words, the Whitehurst-Armor post ignores the extensive evidence that contradicts their thesis, and that is most directly relevant to evaluating the Obama Administration preschool proposal.

The Whitehurst-Armor blog post begins by again citing the Head Start random assignment study:

“The most credible recent study of pre-K outcomes, the federal Head Start Impact Study, found only small differences at the end of the Head Start year between the performance of children randomly assigned to Head Start vs. the control group, e.g., about a month’s superiority in vocabulary for the Head Start group. There were virtually no differences between Head Start and the control group once the children were in elementary school.”

My first point is that it is questionable whether the most relevant recent research for the Obama Administration’s preschool proposal is a study of Head Start. The Obama Administration’s preschool proposal is not a proposal that relies on expanding Head Start. Rather, the Obama Administration proposal is to expand support for state pre-K programs for 4-year-olds.  Head Start resources would be shifted towards 3-year olds. I discussed this previously in a blog post on a Wall Street Journal editorial, which also tried to use the Head Start impact study results to criticize proposals to expand state pre-K programs.

The better state and local pre-K programs show larger short-run effects on test scores than is true of Head Start. (See my review of this evidence in Table 1 and surrounding text of my recent paper on a Kalamazoo pre-K program.) Whitehurst questions some of this evidence, to which I will respond below, but it certainly is of limited relevance to use Head Start evidence to attack a proposal that expands quite different preschool programs that are sometimes more educationally focused than some Head Start programs, and that have a quite different governance structure.

My second point is that to the extent to which the Head Start research evidence is relevant to a proposal to expand state pre-K programs, Whitehurst and Armor are ignoring the significant research evidence that shows long-run effects of Head Start.  As I have pointed out in previous blog posts, there is research evidence with good comparison groups that shows long-run Head Start benefits. This includes research that compares similar counties with different early access to Head Start, and studies that compare siblings in which one sibling participates in Head Start and the other does not.

What is particularly interesting is that this long-run research evidence also shows long-run benefits of Head Start even when short-term effects fade. For example, Deming’s research shows quite large effects of Head Start in increasing educational attainment and employment rates, and reducing crime involvement. These benefits are sufficient to predict a long-run wage gain due to Head Start of 11%, and a rate of return to the public investment in Head Start of almost 8% in real terms. But Deming’s research also shows that the initial test score effects of Head Start fade to statistical insignificance as former participants go through the K-12 system. Apparently this fading of test score effects does not determine other long-term outcomes.

Whitehurst and Armor then mention Whitehurst’s previous blog post that argued that states with more pre-K enrollment did not have significantly higher 4th grade test score results. As I argued in a previous response, the correlation between state pre-K enrollment and 4th grade test scores is actually sufficiently strong to imply a 5 to 1 benefit-cost ratio for adding enrollment in typical state pre-K programs.

Whitehurst and Armor then go on to argue extensively against the methodology used in studying many (not all!) state pre-K programs, which are labeled as “regression discontinuity” studies. What these studies do is administer the same tests to pre-K entrants as they enter pre-K, and to pre-K graduates as they enter kindergarten. All the students administered tests are similar in whatever observed and unobserved characteristics determine selection into the pre-K program. But the pre-K graduates differ in two respects: they are a year older; they have experienced a year of pre-K. Because pre-K programs and kindergarten programs use an age cut-off to determine enrollment, we have children within each group who differ in age by up to a year. In fact, we have children in the pre-K entrant group who are just a few days younger than children in the pre-K graduate group. Therefore, we can use the evidence in the sample on how test scores vary with age to separate out the effects of aging on test scores from the effects of pre-K. One way to understand this is that we see how test scores “jump” when comparing students who just missed the pre-K cutoff the previous year versus students a few days older who just made the pre-K cutoff the previous year. This “jump” is the estimated effect of pre-K. (For more extensive discussions of this methodology, see my paper on Kalamazoo pre-K, or my paper with Gormley and Adelstein on Tulsa pre-K.)

My response to Whitehurst and Armor’s critique of regression discontinuity methodology can be summarized as follows: first, regression discontinuity studies do provide reliable and policy-relevant estimates of the short-term effects of age 4 pre-K programs versus no such programs. Second, their critique ignores the many good studies of state and local pre-K programs that do not rely on regression discontinuity.

Whitehurst and Armor’s first methodological comment is that regression discontinuity studies are making a somewhat different comparison than would be made by a random assignment study of pre-K. This is true, but is less policy relevant than they imply. Regression discontinuity studies are using a comparison group of children who just missed the pre-K age cutoff for 4-year olds. In contrast, the comparison group in a random assignment study is age-eligible for 4-year-old preschool programs. As they point out, the comparison group in a random assignment study is more likely to participate in preschool and more likely to participate in more educationally-oriented preschool than is true of children who miss the age-4 pre-K cutoff.

Whitehurst and Armor’s point is that what a state legislator should want to know is how will expanding pre-K access affect children compared to not expanding access? Therefore, the relevant counterfactual is what these 4-year-olds would be doing without the state program. They argue that this is what the random assignment experiment estimates.

However, their argument overlooks that what the state legislator really should want to know is how to benefits and costs compare between two groups, one of which has greater access to cheaper and higher-quality pre-K than the second group. So in the random assignment study, the benefit cost analysis would not only have to look at the benefits of the greater preschool access, but also the cost savings for existing preschool programs  from greater public preschool access.  Many of the children in the control group are also in preschool programs with large costs paid for by the government or by parents.  These cost savings should be subtracted from the net costs of expanding pre-K. Therefore, the benefit-cost analysis from a random assignment study should be comparing the CHANGE in benefits from the change in preschool access with the CHANGE in costs from the change in preschool access. It would under-estimate the benefit-cost ratio if we compared this CHANGE in benefits with the total public costs of the pre-K expansion, as this overlooks the reduction in costs of existing preschool programs.

But exactly this same calculation can be done using the estimates from the regression discontinuity studies. Regression discontinuity studies come closer than random assignment studies to estimating the impact of preschool versus no preschool for children; in contrast, random assignment studies compare a particular pre-K program versus currently available pre-K programs.  But if we examine a proposed expansion of pre-K programs, we can estimate how this will affect the total number of kids enrolled in reasonable quality pre-K programs, either by plausible assumptions before the program is implemented, or by actual data after the program is implemented. The regression discontinuity studies provide useful estimates of the kindergarten readiness test score effects of having more kids involved in reasonable quality pre-K programs. These estimates can be compared with the net incremental costs of funding these additional slots.

Although it might seem that this benefit-cost calculation from regression discontinuity research evidence requires more assumptions, exactly the same exercise would have to be done using evidence from random assignment studies. Any random assignment study’s estimated effects for pre-K programs depend in part on what other pre-K programs are around. But what pre-K programs are around is always changing due to different parent behavior or changes in a wide variety of government subsidies for pre-K and child care. So random assignment studies can only be used for policy analysis if we adjust the raw estimates for changes over time in what other pre-K programs are available. There is no getting around the need for us to adjust our benefit-cost estimates for the current educational environment.

Whitehurst and Armor’s second methodological point is that regression discontinuity studies may suffer from differential attrition in the “treatment group” versus the comparison group, which may bias the results. This is true, but is also true of any real-world random assignment study. In random assignment studies, it is almost always true that not all participants in the treatment and control group can be tracked down, and that attrition could possibly vary based on the effects of the “treatment”.

However, in both regression discontinuity studies and random assignment studies, we can make some attempt to see how attrition may bias the results by looking at how observable variables differ between the treatment group and the comparison group who remain, after attrition. We can never test for possible differences between the two groups in “unobservable” variables. However, it seems plausible that if attrition led to some differences in unobservable variables, it would also lead to some differences in observable variables.

These tests for differences in observable variables have been regularly done in regression discontinuity studies of state and local pre-K programs. For example, my paper with Gormley and Adelstein did such a test. We found no evidence that attrition or anything else led to any differences in observable variables between the treatment and comparison groups, the pre-K graduates versus the pre-K entrants.

Whitehurst and Armor’s final methodological point is that “age-cutoff regression discontinuity designs produce implausibly large estimates of effects.” Why are they implausibly large? Because they are much larger than the Head Start effects! The reasoning here is somewhat circular. It appears that no regression discontinuity estimates for state pre-K programs will be accepted by Whitehurst if they significantly exceed the random assignment Head Start estimates, even though these estimates are for quite different programs.

As the discussion above suggests, we would expect regression discontinuity studies to yield somewhat larger raw effects for pre-K programs than would be true for random assignment studies. Regression discontinuity studies estimate the effect of pre-K versus no pre-K, whereas random assignment studies estimate the effect of a pre-K program versus the status quo of what is available. However, once these estimates are embedded in a benefit cost analysis that compares the differences in pre-K access and costs in two different scenarios, these differences will disappear.

Are regression discontinuity studies estimated effects implausibly large? The very studies that Whitehurst and Armor cite don’t suggest this. They cite a Tulsa study showing test score improvements by 9 months, a New Jersey study that shows test score improvements by 4 months, and a Boston study that finds test score improvements by 6 months. If typical students improve by 9 months during the pre-K year without pre-K, then what these studies are suggesting is that pre-K, versus no pre-K, improves learning during the school year by from 44% (New Jersey, an additional 4 months on a base of 9 months) to 67% (Boston) to 100% (Tulsa). These improvements from pre-K versus no pre-K do not sound implausible to me – learning pace increases by 40% to 100% seem intuitively plausible in an educationally-focused program.

There are two other pieces of evidence that I would present for the reliability of the regression discontinuity evidence.  First, there is a study, in Tennessee, that uses both regression discontinuity methods and random assignment methods to study a state pre-K program. Both approaches show statistically significant effects of Tennessee’s pre-K program (See, for example, Lipsey et al, 2010). The regression discontinuity estimates are somewhat larger. (An effect size averaging 0.64 across the various tests, versus an effect size of 0.34 from random assignment.) But we would expect this, as the regression discontinuity estimates are measuring the effects of preschool versus no preschool, whereas the random assignment estimates are measuring expanded preschool versus what preschool is currently available. The estimates are reasonably consistent.

Second, my estimates with Gormley and Adelstein using regression discontinuity methods for Tulsa’s pre-K program show a pattern of results across half-day and full-day pre-K programs that is reasonable. For example, our estimates show that half-day pre-K increases test scores for children eligible for a free lunch by 12 percentiles, whereas full-day pre-K increase test scores for children eligible for a free lunch by 18 percentiles. This pattern is very reasonable, as one would expect a greater return to full-day than half-day pre-K, but perhaps not a doubled return.  If regression discontinuity estimates were seriously biased, we would not necessarily expect these biases to result in such a reasonable pattern in estimated test score effects.

Finally, Whitehurst and Armor’s critique of regression discontinuity methods ignores the research evidence from studies of state and local pre-K programs that use other research methods. There are a wide variety of such studies. The studies with the best long-run evidence are for the Chicago Child-Parent Center program.  Arthur Reynolds and his colleagues have done a series of studies of this program showing large long-run effects in boosting educational attainment, reducing crime, and boosting earnings. The implied benefit-cost ratios are quite large.

I don’t see how any balanced discussion of Obama’s proposal to expand state and local pre-K programs can ignore the Chicago Child-Parent Center program and its research evidence. The CPC program is much more similar to the higher-quality state and local pre-K programs around the U.S. than is true of Head Start.

In sum, there is significant research evidence that supports efforts to expand high-quality state and local pre-K programs for 4 year olds. This evidence goes well beyond the Head Start evidence to consider numerous regression discontinuity and other studies of state and local pre-K programs.

Is the evidence perfect? No, but in the real world, evidence for any policy intervention will never be perfect.  Whitehurst and Armor go on to advocate for more “demonstration” research projects on pre-K rather than implementing pre-K on a large scale. If the case for pre-K is plausible, this position has a tremendous “opportunity cost”: all the children who could have benefitted from pre-K with an expansion who will not do so because we are waiting around for the elusive definitive study that will answer all questions.

In my view, the research case for state and local pre-K is strong enough that a better course of action is implementing greater access to higher quality pre-K on a large scale while continuing to study how to improve pre-K quality.  While more research is always needed, the need for long-term research should not trump the needs of children today.

Posted in Early childhood programs | 1 Comment

Local areas matter for income mobility, especially for younger kids, and especially for kids from lower-income families

David Leonhardt of the New York Times has highlighted a recent article by economists Raj Chetty, Nathaniel Hendren, Patrick Kline, and Emmanuel Saez.  Chetty et al use IRS data to look at how the income mobility of children varies across different local areas in the U.S. Income mobility is the child’s rank in the U.S. income distribution at ages 30 or 31, in 2010 or 2011, versus their parent’s rank in the income distribution.  This study is able to present such detailed local results largely because the access to IRS data provides unusually large sample sizes for each of the 750 “Commuting Zone” local areas that are defined so as to comprise the entire U.S.

Leonhardt does a good job of summarizing many of the study’s key results.  The key result is that income mobility varies quite a bit across local areas in the U.S., and that only a modest part of these income mobility differences can be explained by observable variables in this data base.

The Chetty et al study considers income mobility for persons who were born well before states began large scale investments in pre-K programs or other early childhood programs. However, I think the study provides indirect evidence that a wide variety of policies that can enhance the experiences of children when they are young may positively affect income mobility.

One of the most interesting findings is that the effects of local areas on income mobility also occurs for children whose parents moved into a local area, but much more so if the parents moved in when the child was young. Apparently something about the environment of a local area makes a key difference to the potential for upward mobility of a child, but only if the child experiences that environment at a young enough age.  The importance of local areas at younger ages appears to be robust to focusing across two siblings in the same family, one who experienced the local area at a younger age than the other sibling. Therefore, the importance of local areas at younger ages is not due to some unobservable differences across families in who moves where.

Another interesting finding is that local areas matter much more to income mobility for children from lower income families.  The intuition might be that upper-income families are able to insulate children more from some of the problems in a metro area, whereas lower-income families are affected more by the local “environment”: the local economy, social culture, and educational opportunities.

The Chetty et al. study finds some evidence that local areas will have greater income mobility if they have less residential income segregation, more generous earned income tax credits for low-income, higher test scores in K-12, and lower high school dropout rates.  Both K-12 test scores and high school dropout rates can be improved by high-quality early childhood education programs. Therefore, early childhood programs at least should have important indirect effects in increasing income mobility.  It is possible that if we had later data on income mobility, with more variation across local areas in early childhood programs, that early childhood programs would have even larger direct effects in improving income mobility.  However, obviously many other aspects of the local environment for lower income families also affect income mobility.  Improving early childhood programs is only part of what needs to happen to improve U.S. income mobility. Dealing with issues of income segregation and racial segregation, and directly helping the living standards of low-income families with young children, are also important policies to consider.

But in my view, the key point remains: local areas matter. There is some tendency in national debates in the U.S. to assume that what matters to important economic and social outcomes is the national economy and national policy, and that local areas only have a peripheral role in affecting these outcomes. That is not the case. What state and local governments choose to do in public policy has a major effect on the long-term economic well-being of the children who grow up in a particular state or local area. Policy activists who focus on state and local public policy, including early childhood advocates, should take this as encouragement to redouble their affects to affect state and local policies. State and local policies make a big difference, especially for young children, and especially for young children from lower-income families.

Posted in Early childhood programs, National vs. state vs. local | 1 Comment

Pre-K teacher salaries, teacher quality and turnover, and outcomes for children

Marcy Whitebook, director of the Center for the Study of Child Care Employment at the University of California-Berkeley, has a useful brief article on the consequences of low pre-K teacher salaries for providing quality pre-K programs on a large scale.  This article appears in the latest issue of the Upjohn Institute newsletter.

As Dr. Whitebook points out, pre-K teachers earn quite low wages on average compared to other teachers. Pre-K teacher salaries are less than $20 per hour, whereas kindergarten teacher salaries are close to $35 per hour.  This is consistent with evidence I have presented in a previous blog post.

Dr. Whitebook emphasizes that these low salaries lead to higher teacher turnover.  Low salaries also discourage high-quality teachers from choosing or persisting in the pre-K field.  Higher pre-K teacher salaries do not guarantee better child outcomes. And certainly on a small scale, and in individual pre-K programs, there are dedicated and pre-K teachers who do a great job despite being paid low salaries. But it is hard to imagine how pre-K programs can be implemented on a large scale in a high quality manner without teacher salaries being increased to consistently attract and retain more higher quality teachers.

Posted in Early childhood program design issues, Early childhood programs | Comments Off on Pre-K teacher salaries, teacher quality and turnover, and outcomes for children

My recent appearance on HuffPost Live

On Wednesday, June 19, I participated in a video discussion of preschool on HuffPost Live. An archive of the approximately 25 minute discussion can be found here.

Posted in Early childhood program design issues, Early childhood programs | Comments Off on My recent appearance on HuffPost Live

Brookings article provides support for high benefit-cost ratios for state pre-K, but you wouldn’t know it from the article

In response to a reader request, I looked at a recent article on Obama’s preschool proposal, written by Grover J. “Russ” Whitehurst of the Brookings Institution.  Dr. Whitehurst is a child psychologist who previously directed the research arm of the Department of Education.

The purpose of the article is to see what student achievement effects might be plausible for President Obama’s proposal to help states expand their pre-K programs. Dr. Whitehurst is critical of Secretary of Education Duncan for citing Nobel Prize winning economist James Heckman’s research that programs such as the Perry Preschool program have a return of $7 for every dollar invested.

Dr. Whitehurst comments that

“I’m willing to bet the farm that the typical state pre-K program for four-year-olds that would be expanded if the Obama administration’s proposal were enacted isn’t going to have the impact of Perry…”

My comment: I agree that the typical state pre-K program would probably not have as great an impact as Perry. Of course, the typical state pre-K program that would be expanded probably will spend only a fraction of what Perry would cost. Perry had a class size of 13 students per 2 certified teachers paid public school wages, and lasted for 2 years (ages 3 and 4) for most child participants. The total cost in today’s dollars was over $11,000 per student per year, or $22,000 per student for the two years. The typical state pre-K program does not spend anywhere near that amount per student. So, expanding state pre-K programs could still have very high benefit-cost ratios even if their benefits are only a small fraction of Perry’s benefits.

Dr. Whitehurst goes on to do some analysis of the correlation between state pre-K enrollments in 2006, and 4th grade test scores on the National Assessment of Educational Progress in 2011, 5 years later, when those former pre-K participants would have been in 4th grade.  He controls for state median income, the percentage of the state’s school-aged population that is non-white, and the percentage of the state’s 4th graders who qualify for a free or reduced price lunch. He finds that with these controls, higher state pre-K participation in 2006 is associated with higher NAEP test scores in 2011 in reading and math.

As Dr. Whitehurst acknowledges, it is hard to be sure that this analysis is picking up causal effects of state pre-K participation on subsequent test scores. There are many unobservable state characteristics that may also affect test scores, and that might be correlated with state pre-K participation.  For example, Dr. Whitehurst is concerned in an endnote that “states that have invested the most in their pre-K programs [may] have also been more active than other states in instituting other education reforms”.

My comment: Given the problems with unobserved state characteristics, this analysis should be considered much less authoritative than previous pre-K studies that do have good comparison groups.  Of most relevance are programs similar to most state pre-K programs. This includes the Chicago Child-Parent Center program, which has a good comparison group, and both short-term and long-term evidence for significant benefits.  It also includes the many regression discontinuity studies of state pre-K programs that show short-term test score benefits, which I summarized in my recent paper on Kalamazoo’s pre-K program.  If we’re looking to predict test score effects of the Obama pre-K program, we should be looking at these studies with good comparison groups, not relying on regressions with aggregate state data that cannot control for unobserved state characteristics.

Dr. Whitehurst’s spin on his analysis is that it only shows modest effects of state’s pre-K enrollment on NAEP test scores, which means that any proposed expansion of such programs will need to be carefully targeted and designed in order to have net benefits:

“What do I conclude?

  1. There are modest positive associations between enrollment levels in state pre-K and later academic achievement once demographic differences among states are taken into account.

  2. If these associations reflect a cause and effect relationship then raising the level of state pre-K enrollment would enhance academic achievement.

  3. The impact of very substantial increases in the level of state pre-K enrollment (e.g., two standard deviations of current enrollment levels or about 32%) would likely be no more than a few points on NAEP.

  4. Raising NAEP scores a couple of points is worth pursuing in the context that national NAEP scores on reading in 4th grade were only four points higher in 2011 than they were in 1992, but this falls far short of the impacts that advocates of the expansion of state pre-K have touted based on extrapolations of findings from studies of a few high-cost, multiyear, boutique preschool programs from many years ago, such as Perry.

  5. If we are to move forward with a new federal program to support state pre-K programs we need to think carefully about the costs and benefits and figure out how to minimize the former while maximizing the latter. “

My comment: Of course, I agree we need to think carefully about benefits and costs of any new federal program, whether pre-K or any other proposed program. (I’m an economist and policy wonk; how could I be against benefit-cost analysis?)

However, I think that Dr. Whitehurst’s numbers as presented actually suggest extremely high benefit-cost ratios for expansion of state pre-K programs, perhaps even comparable to some figures for Perry.  The underlying reason is that even very slight increases in NAEP test scores have large predicted future benefits, while the “very substantial” increases in state pre-K enrollment he discusses would have quite modest costs.

To see this, I think it’s easier to do the analysis in terms of the benefits and costs from having one more child in a state pre-K program.  Whitehurst’s model as presented assumes that benefits and costs are scaled up or down proportionately with the number of children participating, so benefit-cost ratios do not depend on program size.

Whitehurst’s estimates imply that going from 0% of a state’s 4-year-olds in pre-K, to 100% of all 4-year-olds in pre-K, would raise NAEP scores by 7 points. It is as if each child who participates in pre-K raises his or her NAEP scores by 7 points. (Or alternatively, that child may raise his or her own test scores by less than 7 points, but have spillover benefits for other students that add up to a total of a 7 point increase in NAEP scores.)

A seven point increase in NAEP scores is about an “effect size” of 0.2.  (“Effect size” is education jargon for the test score increase divided by the “standard deviation” of test scores among children, in this case the standard deviation of 4th grade test scores.) This is equivalent near the median test score to an increase in test scores of about 8 percentiles. For example, if the child would have scored at the median or 50th percentile, pre-K participation would increase their 4th grade test scores by 8 percentiles to the 58th percentile.

What are the benefits from an increase in 4th grade test scores of 8 percentiles? One benefit is the predicted increase in future earnings because students with higher test scores will have higher educational attainment, higher employment rates and higher wage rates.  The analysis by Chetty et al. suggests that an increase of one percentile in 4th grade test scores increases adult earnings by about 0.6%. (Wonk note: This takes the Chetty et al. figures from column 1 of Appendix Table V, and divides by mean earnings in Table I.) Therefore, an increase in a child’s 4th grade test scores of 8 percentiles would increase his or her future adult earnings by a little less than 5%.

Using some Current Population Survey figures on adult earnings by age, developed for my 2011 book on pre-K, the average present value for an individual worker of adult earnings in the U.S., discounted back to age 4 at a 3% real discount rate, is about one-half million dollars. A 5% increase would increase the present value of earnings by about $25,000. (All figures have been adjusted to 2012 dollars.) Even a slight increase in worker skills and earnings, when added over a worker’s entire career, can sum to a considerable benefit.

Obviously one could tweak these numbers to get different estimates. But the result remains that one would expect that these NAEP test score increases would be associated with adult earnings increases whose present value is at least in the tens of thousands of dollars per individual worker.

What does pre-K cost for this one child? Well, back in 2006, state spending per pre-K participant averaged a little less than $4,000 per child. (Source: The same NIEER report used by Whitehurst, but I adjusted the dollar figures to 2012 dollars.) There also was some local spending per pre-K participant. However, it seems unlikely that average local spending totaled more than $1,000 per participant on average. So the total spending per child was probably less than $5,000.

What do I conclude?

  1. Based on Whitehurst’s regression, state pre-K programs would be predicted to increase the present value of future adult earnings by over $25,000 per individual child, at a cost of $5,000 per child, for a benefit-cost ratio of 5 to 1.
  2. The earnings benefits might be even greater if one looked at kindergarten test score effects. Pre-K studies show substantial fading of test score effects from kindergarten to 4th grade. Yet the Chetty study as well as other studies (for example, of the Chicago Child Parent program) suggest that the initial test score effects are a better predictor of long-run effects on adult earnings.
  3. In addition, if state pre-K programs have any anti-crime benefits or other benefits, this would substantially increase the benefit-cost ratio. In many studies, including Perry and the Chicago program, the anti-crime benefits of pre-K are of similar magnitude to the earnings benefits (see Bartik, Gormley, and Adelstein article for review and links).
  4. As presented, the Whitehurst estimates numbers provide no basis for different benefit cost ratios for different scales of expanding pre-K programs. The benefits and costs per participating child would seem to be the same, keeping the benefit-cost ratio the same. Increasing pre-K participation by only 32% rather than 100% would have only 32% as great benefits, but also 32% as great costs.  Whitehurst gets more modest test score effects of a 32% increase in pre-K participation because these test score increases are averaged over all kids. But he does not point out that the costs averaged over all kids will also be scaled back.
  5. All these calculations rest on a regression that may be biased due to unobserved state characteristics. Better calculations should be based on the Chicago CPC studies and the state pre-K regression discontinuity studies and other state studies. These studies often show high benefit-cost ratios.  For example, a recent CPC study shows a benefit-cost ratio from a one-year pre-K program of over 13 to 1.  Even if typical state programs were less cost-effective than CPC, they could still have very large benefit-cost ratios.

Dr. Whitehurst finishes his article with his brief thoughts on increasing the efficiency of pre-K programs. His suggested pre-K reforms are not provided in his article with much justification. But his thoughts are rationalized by the notion that expanding pre-K programs as they currently exist, without dramatic reforms, would not have high enough benefit-cost ratios, which he believes to be the implication of his analysis of NAEP test scores. However, as I have argued here, his analysis actually implies that state pre-K programs as they currently exist have high benefit-cost ratios. This analysis is based on weak empirical evidence, as he acknowledges, but his results are consistent with previous research with better comparison groups.  Dramatically increasing the efficiency of pre-K programs is a desirable goal if it is achievable with well-researched policies. But in my opinion, even without dramatic reforms, we already know how to run pre-K programs at a state level that have high benefit-cost ratios, and that offer broad access to many income groups.

Posted in Early childhood program design issues, Early childhood programs | Comments Off on Brookings article provides support for high benefit-cost ratios for state pre-K, but you wouldn’t know it from the article

New study shows large effects of high-quality pre-K for a broad-access program

My new study of an innovative pre-K program in Kalamazoo County, Michigan, has just been posted at the Upjohn Institute website.

The Kalamazoo County program is called “Kalamazoo County Ready 4s”, or KC Ready 4s.  KC Ready 4s aims to move towards universal access to high-quality pre-K for all four-year olds in Kalamazoo County.  The program seeks to do so in part by providing training and assistance for local pre-K providers to improve their quality. In addition, KC Ready 4s provides tuition assistance on a sliding fee scale for 4-year olds to attend approved pre-K programs. The program currently assists over 100 4-year olds in Kalamazoo County, with plans for expansion as funds permit.  Because the program aims at universal access, the program provides tuition assistance to families at a wide variety of income levels, including many working class and middle class families. I currently serve on KC Ready 4s’ Board.

My study estimates the short-run effects of the KC Ready 4s program on kindergarten entry test scores. The paper finds that the program leads to test score effects which are quite large.

Specifically, the program finds effects on kindergarten test scores which would be equivalent to an increase in test scores of at least 19 percentiles, e.g., a child who would have been at the 31st percentile on test scores would instead score at the 50th percentile or the median.  This represents an increase in what children would otherwise learn without pre-K of at least 50 percent.

Based on previous studies of the relationship between kindergarten test scores and adult earnings, these test score gains would be predicted to increase adult earnings by around 10 percent, which would be an increase of many tens of thousands of dollars. Because the program’s half-day pre-K costs about $4,500 per child, the implication is that the earnings benefits of the program alone would clearly pass a benefit/cost test.

What relevance does this study have to the continuing national debate over pre-K? One important relevant point is that the study adds to evidence that pre-K programs that include working class and middle class families can provide benefits that exceed costs.  We already have such evidence from studies in Tulsa and Boston.  Kalamazoo now provides additional evidence.

There is a strong political case for designing pre-K programs that have broad benefits for many families. The evidence also increasingly indicates that there is a strong economic case for broad-access programs, because many children, not just low-income children, can benefit from public support for high-quality pre-K.  Finally, by broadening access to pre-K, we also have a greater impact on local economic development, because by doing so we have a much greater impact on the overall quality of the local workforce.

Posted in Distribution of benefits, Early childhood programs | 1 Comment

Public radio interview about Michigan’s expansion of pre-K

I was recently interviewed by the local public radio station, WMUK, about the recent legislative agreement on the expansion of Michigan’s state-funded pre-K program, the Great Start Readiness Program. Governor Snyder’s proposal for expanded pre-K has now passed both houses of the legislature, and a conference committee has reconciled differences between the two bills.  The next step will be legislative enactment of the conference committee bill and the Governor’s signature.

As I said in my interview, this bill represents significant progress for the state of Michigan on early childhood issues. The bill should increase the number of slots in the state’s pre-K program by about one-quarter. The percentage of the state’s four-year-olds in the state pre-K program should increase from about 19% to about 24%.  In addition, the bill’s increase of the state funding per half-day pre-K slot from $3,400 to $3,625 represents the first increase in real state funding per slot in 10 years. This increase will help maintain program quality and encourage expansion of the program.

On the negative side, the $3,625 is still inadequate. A high-quality half-day slot probably costs around $4,500, to allow for adequate teacher salaries to attract quality teachers, and to pay for other needed costs. For most of GSRP’s history, its funding per slot, in today’s dollars, has been over $4,000.

Furthermore, Michigan is still well below leading states, such as Oklahoma, in the percentage of 4-year-olds in state-funded pre-K. Oklahoma has 74% of all 4-year-olds in state-funded pre-K, which is over triple the percentage of Michigan even after this expansion.

Finally, the new legislation attempts to increase the income targeting of the program through somewhat complicated administrative procedures that include segmenting applicant families by income and trying to prioritize lower-income families.  But we know that pre-K has benefits for families from a wide variety of income levels. In addition, income-integrated pre-K has more positive peer effects. Given limited funding, one can understand the legislature’s interest in targeting the program on lower-income families. But this could be done in simpler ways, such as having a straightforward sliding-fee scale system that would target program resources while not promoting undue income segregation.

However, the legislation represents an important step forward for Michigan. It will be more effective in the long-term if this first step is followed by further reforms in later years. Such reforms should include a gradual expansion of the state program, increases in real per-slot funding, and a more moderate approach to income targeting that allows for income-integrated programs.

Posted in Early childhood program design issues, Early childhood programs | Comments Off on Public radio interview about Michigan’s expansion of pre-K

Comments on Heckman book, “Giving Kids a Fair Chance”

Nobel prize-winning economist James Heckman has a recent (March 2013) short book, Giving Kids a Fair Chance. The book has a short essay (about 40 small pages) by Heckman, followed by comments on Heckman’s essay by 11 commentators with a wide range of expertise and ideology, and then followed by a brief reply by Professor Heckman.

The book is an excellent introduction to Professor Heckman’s approach to thinking about early childhood programs. It also provides some insights into some ideas that might be usefully added to Professor Heckman’s approach.

The essay summarizes Professor Heckman’s three main points about early childhood development. First, although both cognitive and non-cognitive skills are important to a person’s success in life, our approach to educational and social policy has sometimes over-stressed cognitive skills. Second, the development of both cognitive and non-cognitive skills is often impaired in disadvantaged families, which includes many low-income families, although we should recognize that many children in higher income families are also at risk of being disadvantaged in early childhood development.  Third, early childhood programs that address both cognitive and non-cognitive skills often have very high rates of return, particularly for disadvantaged families, and many later interventions have much lower rates of return.

Professor Heckman argues for the effectiveness of early childhood programs on the basis of randomized experiments such as the Perry Preschool program, the Abecedarian program, and the Nurse Family Partnership program. He also acknowledges the point made by commentator David Deming, that the recent experimental results that Head Start test score effects fade is not strong evidence against early childhood programs because so many children in the control group were enrolled in other preschool programs.

I would add the following points to Heckman’s case. First, I believe there is stronger evidence than noted by Professor Heckman for “universal” benefits of early childhood programs.  Preschool appears to have benefits for middle-class as well as low-income children, as shown by studies in both Tulsa and Boston. In addition, as argued by one of the commentators (Robin West), and acknowledged in Professor Heckman’s reply, early childhood programs can benefit a significant number of parents by providing affordable quality child care. This allows parents to build job skills via work and education. These better outcomes for parents also help their children, by boosting family living standards, as pointed out by commentator Mike Rose and acknowledged by Heckman.

Second, I believe there is stronger evidence than noted by Professor Heckman for the effectiveness of some later interventions.  He acknowledges the point of some commentators (e.g., Carol Dweck) that the research evidence suggests that non-cognitive skills are quite malleable in adolescence and into the later 20s. I would add to this that there is considerable evidence (for example from my colleague Kevin Hollenbeck ) that later training and education interventions can be very effective if they target specific job skills that are in high demand. Even if IQ and non-cognitive skills are not malleable at later ages, job training programs can usefully provide affordable skills training that is attuned to labor market needs.

What is true is that early interventions can more easily provide broader and more profound benefits for a wide variety of persons. As Heckman has emphasized, later educational and training investments build on earlier investments, and therefore the rate of return to later investments may be greater for those who have received earlier investments. Where I differ is that the public sector can usefully increase the efficiency of these later investments.

Third, I think it is useful to emphasize that these early childhood investments not only benefits the assisted children and their parents, but also have large spillover benefits for the broader society.  I don’t think that Professor Heckman would disagree with the presence of such spillover benefits, as they are widely acknowledged by economists, but it is useful to point out these social spillovers to an audience that might be suspicious about paying higher taxes to benefit “other people’s children”. These spillovers include not only lower crime and welfare costs, but also positive benefits in higher wages and employment rates because early childhood programs will promote local job growth and wage growth. In my opinion, these positive local labor market benefits are particularly politically appealing. (A skeptic could argue that there are alternative cheaper policies that directly sanction crime and welfare use.)

Fourth, the book only begins to get into the political challenges facing any proposed expansion of early childhood programs. As commentator Robin West points out, any government intervention in early childhood raises all kinds of fears and political resistance.  Overcoming such political resistance requires offering salient benefits for a sufficient number of groups.  Professor Heckman briefly mentions in his essay the possibility of offering universal programs, with some fees for upper-income families, to avoid the programs being stigmatized by being associated only with the poor. Commentators Adam Swift and Harry Brighouse also emphasize the political importance of universality.

Overall, this book provides an excellent introduction to the debate over early childhood programs. But adoption of early childhood programs will depend at least as much on getting the politics right as on finding arguments that appeal to policy wonks like me.

Posted in Early childhood program design issues, Early childhood programs, Economic development | 4 Comments

Moving the U.S. towards a more universal, high-quality early education system

Lane Kenworthy, a well-known comparative sociologist of inequality issues at the University of Arizona, has a thought-provoking blog post on why the U.S. should more towards a high-quality early education system.

Based on his own extensive knowledge of Scandinavian social welfare systems, Kenworthy advocates that the U.S. move towards an early childhood system similar to the systems used in Denmark and Sweden.  This system would include: paid parental leave during a child’s first year; high-quality child care and preschool from ages 1 to 5, with parents paying fees capped at 10% of income, and with the rest of the costs of this high-quality system being supported by a government subsidy. He roughly estimates that such a system would cost about 1% of U.S. GDP, or $160 billion in annual spending.

I agree with Kenworthy’s policy goals, and think he provides some valuable comparative evidence, as well as summarizing much research on this issue. Therefore, perhaps the most important point of this comment is that you should read his blog post and the research underlying Kenworthy’s blog post.

I have a few comments on Kenworthy’s article that I would add.

First, Kenworthy comments that it’s too early to tell whether the universal preschool programs in Oklahoma and Georgia have had long-run effects. I would add, as I’ve detailed in a previous post, that it’s more difficult than our intuition would tell us to detect statistically significant effects of even large benefits from case studies of the experiences of one or two other states. There are simply too many factors driving aggregate educational and economic performance.

Therefore, evaluations of the quality of the research evidence for early childhood programs should rest more on well-done studies of what happens to program participants and their families compared to similar non-participants.  Aggregate studies of impact are more difficult to get precise results.

Second, Kenworthy wonders whether we can detect effects of universal early education systems on long-run growth. I regard the issue of whether early education, or education in general, affects growth rates, as opposed to levels of income and earnings, as of great importance. My own research has been quite conservative and simply assumed that improving job skills via early childhood education has effects on productivity levels and earnings levels.  But if one instead assumes that these programs affect growth rates in productivity and earnings, then the resulting long-run returns to these programs can be many multiples of what I assume. This is documented in the work by Bill Dickens and his colleagues in applying various economic growth models to early childhood programs.

The problem is that such effects on economic growth are quite lagged, and therefore it will always be hard to link economic growth effects with a specific policy. However, even a very small effect on annual economic growth rates will over the long-run yield huge benefits. Therefore, I regard trying to detect such long-run growth effects a topic worthy of sustained research attention.

Third, Kenworthy argues that in addition to government subsidies for early education, we also need government to directly provide early education, because “that’s the only way to guarantee universal access to preschool and care that’s above an acceptable quality threshold.” I don’t know if the research evidence for this statement is clear. I could imagine a charter system or voucher system for early education that could be universal and of high quality. The question is whether the politics of charter systems or voucher systems leads to pressure to water down quality standards and universality, as such systems can be gamed to provide larger benefits for middle-class groups and private providers.  I would be willing to be persuaded that some public provision is essential, but I think this requires more research evidence and a more extended argument.

Finally, I would argue that although a universal system from birth to age 5 is desirable, the best in this case may be the enemy of the good. I think there is a significant argument that the goal of a universal early childhood system may need to be reached by gradual steps towards that goal. A reasonable first step may be to move towards universal access to full-day preschool at age 4, which would cost less than one-fifth of the price tag for Kenworthy’s ideal system. The lower price tag makes this first step more politically achievable.

Furthermore, the evidence suggests that preschool by itself probably has among the highest benefit-cost ratios for effects on child development for a wide variety of children. Rates of return for child development effects of child care are certainly high enough to justify public subsidy, but tend to be somewhat lower than for preschool because of the high cost of comprehensive child care for several years. Furthermore, the child development benefits of child care subsidies may be more targeted on the disadvantaged that is true of preschool, which provides services that are difficult for almost all parents to provide on their own.

At the same time, we should keep the ideal of a universal comprehensive early childhood system in mind. What is politically practical changes with the times. There may be opportunities that arise to enhance the quality of America’s child care system, and to better manage and coordinate the complex array of child care subsidies through both government agencies and the tax system. Kenworthy’s article provides a good description of an ideal that is attainable, if we have the political will.

Posted in Early childhood program design issues, Early childhood programs, Economic development | Comments Off on Moving the U.S. towards a more universal, high-quality early education system