The reliability of estimates of effects of state and local pre-K programs on kindergarten test scores

A recent article on pre-K that has gained some public attention (for example, in columns by Mona Charen and Reihan Salam) is “The Dubious Promise of Universal Preschool”, by George Mason professors David Armor and Sonia Sousa, published in the Winter 2014 edition of the journal National Affairs.

Professors Armor and Sousa argue against universal preschool principally on two grounds: the random assignment study of Head Start is argued to show that test score effects of Head Start quickly fade out; current studies that estimate large effects of many state and local pre-K programs on kindergarten test scores may be biased upwards.

On the first issue, I have addressed the controversy over Head Start in previous blog posts. In brief, other good studies of Head Start do suggest long-run effects on adult outcomes of Head Start, even after many test score impacts fade. These long-run effects of Head Start may reflect effects of Head Start on skills that are not well-measured by standardized tests. In addition, a recent article by Steve Barnett of the National Institute for Early Education Research argues that recent efforts to reform Head Start may be increasing the program’s impacts on early literacy compared to its impacts during the period considered by the random assignment study.

On the second issue, I think Professors Armor and Sousa overstate their critique of recent studies of state and local pre-K programs, and that in fact these studies provide good evidence of significant effects of these programs in improving kindergarten entrance test scores. I am hardly a disinterested observer, as I co-authored one of the class of studies they critique. But I think there is good evidence supporting the validity and reliability of these recent studies.

The studies that Armor and Sousa criticize use a “regression discontinuity” methodology. The basic idea is to administer the same tests at the same time of the year to two groups of students: students just entering the pre-K program being evaluated; students just entering kindergarten who have completed a year in the pre-K program being evaluated. Because both groups of students are from families that chose to participate in pre-k, the two groups should be similar in both observed and unobserved characteristics.

There is one obvious difference between the two groups: the group entering kindergarten is on average a year older, and therefore would be expected to have higher test scores simply due to being older, even without pre-K. But we can statistically control for the effects of age on test scores. We anticipate that age by itself will have a smooth effect on test scores.  In most state and local areas, entrance into kindergarten and public pre-K programs is based on an age cut-off. In the sample, there are students in the entering pre-K group who just missed the age cut-off for entering pre-K the previous year, and there are students in the entering kindergarten group who just made the age cut-off for entering pre-K the previous year and entering kindergarten this year. We expect to see a smooth increase in test scores with age within the entering pre-K group, and within the entering kindergarten group, with a “jump” in test scores at the age cut-off if pre-k has a significant effect on test scores.

Intuitively, we are comparing students who are just a few days apart in age, and therefore are almost the same, and observing whether being in pre-K a year has increased test scores.  However, from a statistical perspective, we use a broader sample to better identify the effects of age versus the effects of pre-K on test scores.

As Professors Armor and Sousa acknowledge, these “regression discontinuity” studies of pre-K have found large effects of pre-K on test scores. These large effects occur in my study with Gormley and Adelstein of Tulsa pre-K, in previous studies of Tulsa pre-K,  in a study I did of Kalamazoo pre-K, and in studies of pre-K programs in Boston, Oklahoma, Michigan, New Jersey, West Virginia, South Carolina, Arkansas, and New Mexico.  These large effects are sufficient to predict sizable effects of pre-K on the adult earnings of former child participants, with earnings effects of up to 10% (Tulsa) or 15% (Boston). The ratio of the present value of these earnings effects to the costs of these pre-K programs is over 2 to 1 for middle-class children, and over 3 to 1 for low-income children.

Professors Armor and Sousa’s main critique of these regression discontinuity studies is that these studies may be biased by sample attrition. Specifically, the group entering kindergarten does not include students who left the school district, and this group might tend to have lower test scores, on average.

Sample attrition is a concern for any evaluation study. For example, it can be a major problem in random assignment studies, as I pointed out in a previous blog post on the recent random assignment evaluation of Tennessee’s pre-K program.

However, for several reasons, I do not think that the existing regression discontinuity pre-K studies are seriously biased by attrition.  First, if there were serious biases by attrition, one would expect them to be reflected in differences at the age cut-off in observable variables other than test scores between the comparison group and treatment group. But these regression discontinuity studies test for such jumps in observable variables at the age cutoff, and do not find such differences.

Second, in their study of Boston’s pre-K program, Professors Weiland and Yoshikawa adjust for sample attrition by reweighting their tested sample so that the reweighted sample has observable characteristics that resemble the original population, before attrition. (They can do this because they have information on the demographic and socioeconomic characteristics of students who entered the pre-K program, but who did not end up entering kindergarten in Boston Public Schools.) They find that this reweighting makes little difference to the estimated effects of pre-K.

Both of these procedures rely on observable variables. One could still argue that even though attrition does not lead to biases due to observable variables differing at the cutoff between the comparison and treatment groups, it might lead to differences in unobservable variables between the two groups that could be correlated with test scores.

However, in my study of Kalamazoo’s pre-K program, I re-estimated a regression discontinuity model using only the same students, that is using data on pre-K entrant scores from one fall combined with data on kindergarten entrant scores from the next fall, and only including students who were observed in both years.  In this model, there cannot be any differences in pre-existing observable and unobservable variables between the comparison and treatment groups because these are the same students observed at two different times. I found that the estimated test score effects from such a “panel data” model differed little from a more traditional regression discontinuity model, in which we compare all pre-K entrants with only the kindergarten entrants who did not attrit from the sample.

For all these reasons, although I certainly think sample attrition is an important issue that might conceivably bias the regression discontinuity pre-K studies, it does not appear that in practice attrition leads to any significant biases. Therefore, these regression discontinuity studies provide good evidence that pre-K increases kindergarten test scores by a sizable amount, which is likely to lead to large subsequent benefits for participants and for society.

About timbartik

Tim Bartik is a senior economist at the Upjohn Institute for Employment Research, a non-profit and non-partisan research organization in Kalamazoo, Michigan. His research specializes in state and local economic development policies and local labor markets.
This entry was posted in Early childhood program design issues, Early childhood programs. Bookmark the permalink.