Pre-K policy should be based on all the evidence, not one study of one state’s programs

Dr. Grover Whitehurst’s latest criticisms of Obama’s preschool plan at the Brown Center website at the Brookings Institution have drawn some attention. He has done numerous posts criticizing Obama’s preschool plan, some of which I’ve responded to in previous posts.

Dr. Whitehurst’s latest criticism is based on recent evidence from the Vanderbilt study of the Tennessee Voluntary Pre-K Program. This study used a randomized control trial methodology. The results suggest that most of the academic and behavioral effects of Tennessee’s pre-K program had faded by the end of kindergarten and the end of first grade.

Dr. Whitehurst argues the following in his concluding paragraph:

“I see these findings as devastating for advocates of the expansion of state pre-k programs. This is the first large scale randomized trial of a present-day state pre-k program. Its methodology soundly trumps the quasi-experimental approaches that have heretofore been the only source of data on which to infer the impact of these programs. And its results align almost perfectly with those of the Head Start Impact Study, the only other large randomized trial that examines the longitudinal effects of having attended a public pre-k program. Based on what we have learned from these studies, the most defensible conclusion is that these statewide programs are not working to meaningfully increase the academic achievement or social/emotional skills and dispositions of children from low-income families. I wish this weren’t so, but facts are stubborn things. Maybe we should figure out how to deliver effective programs before the federal government funds preschool for all.”

I have a number of detailed responses to this argument. But to sum up:

Incomplete findings from one good but imperfect study of one state’s quite imperfect pre-K program do not trump the many good studies of many pre-K programs that show that such programs can be effective, with the right resources and design. It is unwise for either opponents or proponents of expanded pre-K to over-react to one study; rather, decisions should be based on the overall weight of the research evidence.

What follows are more detailed responses:

1. I agree with Sara Mead ‘s comment at Education Week that one can hardly view the latest Tennessee Pre-K results as an argument in favor of pre-K. On the other hand, I also agree with her point that this one item of data does not trump all the other good evidence for pre-K effectiveness, from numerous studies.

2. Dr. Whitehurst is of the opinion that randomized control trials trump all other evidence by far. I disagree. Why do I disagree? First, randomized control trials in practice are hard to run perfectly, which often limits their advantages compared to non-randomized studies.

Second, many non-randomized studies have good comparison groups, for example the Chicago Child-Parent Center studies compare similar neighborhoods with different pre-K access, some Head Start studies compare different siblings in the same family with different Head Start enrollment or different counties with different Head Start access due to federal policies, and the regression discontinuity studies of state pre-K compare kids with differential timing of pre-K access based on birth date. All of these studies with good comparison groups find some good evidence of pre-K’s effectiveness.

We don’t just throw all this info out because of one study of one program in one state. This would be true even if this Vanderbilt study had no issues and one thought that Tennessee’s program was the best program in the country.

3. In the particular case of the Tennessee evaluation, there were some problems with the randomized trial, particularly in Cohort 1 (2009-2010 pre-K participants), in that the parent consent rates were low and variable between Tennessee pre-K participants and non-participants. For example, in Cohort 1 of the study, the Tennessee study only had parental consent to look at the data for 46% of pre-K participants and 32% of non-participants. This improved in the second cohort (pre-K participants in 2010-2011) to 74% of participants and 68% of non-participants providing parental consent. Dr. Whitehurst explicitly says that he focuses his attention on the evidence only in cases where more data was provided due to parental consent.

The problems with parental consent mean that for most of the comparisons, the actual children on whom data were collected no longer constitute a pure random assignment experiment, particularly in Cohort 1. In other words, it could well be that in Cohort 1, although the full treatment sample and full control sample might on average be similar in unobserved characteristics (e.g., parent motivation), as the initial assignment was determined randomly, this might not be at all true of the 46% of pre-K participants and 32% of non-participants for whom most of the data are available. Parental consent may not be random with respect to unobserved characteristics of children and families.

The Vanderbilt researchers tried very hard to control for this problem, using appropriate methods. However, these methods, such as propensity score matching and statistical controls, are the same methods that people use WITHOUT random assignment data, and have the same issue — one can only control for variables one observes, not variables one does not observe. Furthermore, there are many modeling choices in dealing with these issues, and different modeling choices may yield different results.

4. It is of interest that for one of the variables, retention in kindergarten, for which info is available for the full sample, the pre-K program appears to cut retention in kindergarten from 8% to 4%. That is curious if there are really no end of kindergarten effects on achievement or behavior, which is what the data on the smaller sample suggests. Why would the retention rate be cut in half? Something must be going on to produce this result that we don’t observe for the smaller sample.

Furthermore, in the smaller sample for which parental consent WAS obtained, retention was only cut from 6% to 4% — which is a curious discrepancy between the smaller sample, on which Dr. Whitehurst bases his conclusions, and the full sample.

5. The retention differences mean that more of the weaker pre-K students get promoted to first-grade on time, which may be good for them, but which will tend to depress end of first grade scores in the treatment group relative to the control group.

6. Tennessee’s pre-K program appears to spend, according to NIEER, about $5,814 annually per child for a full-day program. Data from the Institute for Women’s Policy Research suggests that high-quality full-day pre-K might cost $9,000 or so annually per child. Tennessee has a lower cost of living and lower teacher salaries, but there does seem to be some gap there. NIEER estimated that Tennessee probably needs to spend at least $2,000 extra per child to consistently deliver quality.

7. As Steve Barnett of NIEER has pointed out, Tennessee’s program results at the end of the pre-K year were on the low end compared to some other state pre-K studies. Perhaps end of pre-K results are more likely to persist if the initial end of pre-K results are larger. Perhaps there is some critical size of effects that one needs to get at the end of pre-K before one can expect much persistence.

8. Sara Mead also raises the point that there may be effects of collective pre-K that differ from individual pre-K. That is, if one puts an entire class through pre-K, and combines this with the right K-3 policies, then teachers in K-3 can teach the entire class more effectively to a higher level. On the other hand, if we just put a few kids through pre-K, then teachers may find that they have to teach the same curriculum at the same pace to meet the needs of the overall class. This result may tend to drag down any initial advantages for the pre-K kids, particularly if the initial advantages at kindergarten entrance are small.

It is of interest here that the Chicago Child-Parent Center study essentially was comparing kids in different neighborhoods that were similar in neighborhood characteristics except for whether they had the CPC program. Did CPC help allow subsequent classroom teaching to improve? Maybe.

9. One in general has to ask why studies sometimes find fade-out and that the control or comparison groups catch up to the treatment group. Some of it may be that all kids are experiencing the same curriculum, which will tend to over time reduce performance differences in the individual comparisons. Another issue is that it is quite possible that teachers are intervening to provide extra help to kids who are behind. If there are initially more of such kids in the control group, then more kids in the control group will get such help. But this is actually another benefit of pre-K — it may reduce the need for teachers to provide remedial help to the pre-K kids, and free up teacher time to do other things.

10. Having said all that: the latest Tennessee Pre-K results do not provide any strong evidence in favor of pre-K. Maybe it is due to lack of full data on all survey respondents or limitations of Tennessee’s program or the lack of community effects in such a study, to reiterate the points mentioned above. It is hard to be sure without better data, ideally on the entire Tennessee sample, and more in-depth studies of what is going on in Tennessee, for example compared to Tulsa or New Jersey or Boston.

On the other hand, I don’t think the latest Tennessee results provide any strong evidence against the general consensus of the research literature, that many state and local pre-K programs are quite effective.

11. Is the implicit message from Dr. Whitehurst that a pre-K program for which we ONLY have evidence for effects at pre-K exit or kindergarten entrance is of no use? Does that really make sense? Is that really a tenable position? Is that the attitude of most middle-class parents — “We don’t care about whether our child is ready for kindergarten, because we’re sure that any initial advantages will fade.” This needs to be thought through. And one needs to think through why fade-out might occur and what it might mean.

12. Finally, what we really should be talking about is how we can replicate state and local pre-K programs that show much larger effects than in Tennessee, such as the programs in Tulsa, Boston, or New Jersey.

6 Responses to Pre-K policy should be based on all the evidence, not one study of one state’s programs

Jan says:

November 23, 2013 at 9:40 am

Just want to thank you for your timely and articulate responses to Grover Whitehurst’s “analysis” and often dismal conclusions about Pre-K. As a professional “in the trenches” having to justify every penny spent with school board members in rural over taxed towns, your work is invaluable! If you’re ever looking for assistance, let me know.
Pingback: Costing Out The New Preschool Proposal - Brodsky Research and Consulting
john a says:

November 26, 2013 at 10:53 am

Tim, Are you as critical of studies that support your thesis as studies that do not? I think your argument would be stronger if you pointed out the strengths and weaknesses of each study you mention, not just point out the potential flaws in ones that are counter to your view. The Vanderbilt study is as strong as any study in this space. If you find the flaws in it make the conclusion unreliable, what does that mean for the studies supporting pre-K that are weaker in design?
Anyways, I do enjoy your blog. I think it’s a very complicated area and society would benefit from completely objective analysis of the evidence on both sides.
- timbartik says:
  
  November 26, 2013 at 12:00 pm
  
  John: You raise a good point. It is always good to try to be skeptical about evidence, particularly if the evidence is in accord with your previous views.
  
  I don’t think I said in my blog post that the Vanderbilt study is less reliable than other studies of pre-K. My point was that the study’s problems with survey response mean that it really can’t be classified as a “gold standard” random assignment study that is significantly more reliable than other studies . It is more in the silver standard category — interesting evidence on one program that has good controls, but might be biased due to unobservable variables that differ between treatment and comparison groups. The same is true of all the early childhood studies with the exception of Perry, Abecedarian, and NFP, which are gold standard random assignment studies.
  
  I think the Chicago studies, and the regression discontinuity studies of state pre-K, and the Head Start sibling studies, and some of the NIEER studies of pre-K that use matched comparison groups, are all good silver standard studies. What gives them greater weight is not each individual study, but rather the fact that so many studies point in the same direction. In addition, the pattern of results in the studies point towards reliability. It is difficult to see why biases due to unobserved attributes of the comparison groups would lead full-day pre-K to be found to be more effective than half-day pre-K (e.g., the TUlsa study), and two years more effective than one year (the Chicago study).
  
  The other point is that of course we might expect results to vary across different programs. Perhaps Tulsa and Chicago just have better programs than Tennessee.
  
  Now, one response to the fact that some programs might work, and others might not, is Whitehurst’s perspective: let’s put a hold on doing anything until “further research” is done, and we know exactly what works and what doesn’t work, using random assignment studies. But not moving forward has a tremendous opportunity cost: the children who might not receive program services that could dramatically change their life prospects.
  
  My perspective is that there is sufficient weight of evidence that we should move forward. We should try to imitate the more successful programs. We should if anything err on the side of “excessive” quality, in the sense that we might find that some quality features that are costly (e.g., small class sizes) could be moderated later on without substantial loss of benefits.
  
  And we should monitor our success. As I’ve mentioned in previous blog posts, we can do ongoing program monitoring by doing regression discontinuity studies of a random sample of pre-K programs. Over time, this should allow us to learn more about what’s working.
  
  Now, this regression discontinuity monitoring only looks at pre-K’s effects at kindergarten entrance. But unlike Whitehurst, I think the evidence is strong that a program that makes a truly BIG difference at kindergarten entrance will lead to long-run success, based on the work of Chetty and others, and based on evidence from Perry and Chicago. It’s the pre-K programs with small effects at kindergarten entrance for which we truly have to worry that fade-out may eliminate all long-run effects.
davematthews says:

November 27, 2013 at 9:00 am

“But this is actually another benefit of pre-K — it may reduce the need for teachers to provide remedial help to the pre-K kids, and free up teacher time to do other things.”

This is an essential point. I have never understood why “fade-out” proves that preschool programs are ineffective, given the enormous resources invested in K-12 remediation programs. It may just show that remediation tends to even things out in the long run – possibly at much greater expense.

Also, I don’t know why Dr. Whitehurst ignored the very low response rates – creating a huge self-selection bias – in the Tennessee study, if he is truly interested in finding out what works. With rates like that, it is simply false that the “methodology soundly trumps” anything.

Finally, having observed many Kindergarten and 1st grade teachers administering the cognitive tests, I would not put a lot of weight in a study that relies exclusively on the results of these tests.
- timbartik says:
  
  November 27, 2013 at 7:51 pm
  
  Dave:
  
  You raise some good points. However, to be fair, it should be noted that other studies that find state or local pre-K programs to be effective also base this claim on the results of standardized tests.

Comments are closed.