What is needed for large-scale evaluation of the effectiveness of pre-K programs?

The Center for American Progress recently released a report on improving the efficiency of publicly-supported early childhood programs.  This report includes many useful recommendations. These recommendations include national standards for pre-K learning, assessment, and data collection.

However, I want to focus in this blog post on one crucial omission, in this otherwise excellent report. The report mentions that policymakers need to “determine what is working and why” in early childhood education. Yet the report never discusses what I believe to be the most promising approach to rigorous, large-scale evaluation of pre-K programs. This promising approach is “regression discontinuity design” evaluation of pre-K programs. If this approach is to be implemented, it has some unusual implications for what data on pre-K programs are to be collected, and how such data are to be collected.

For any educational or social program, evaluations of program effects may be biased by unobserved pre-program differences between participants and non-participants. These differences may arise from many causes, including selection by program administrators, selection by program participants (or their parents, in the case of early childhood programs), or selection by various happenstances. Because of these unobserved differences between participants and non-participants, a comparison of outcomes for participants versus non-participants may not reveal the true effects of the program, but rather may be due to these unobserved differences. This is referred to as “selection bias” in estimating program effects.

In the case of pre-K programs, program administrators may in some cases seek to enroll “needier” children. This will result in program participants tending to have worse outcomes than non-participants, due to needier children being selected. On the other hand, among eligible children, more assertive parents may be more likely to enroll their children in a pre-K program. This will tend to select children for pre-K participation who may tend to do better even without the program.

The “gold standard” for eliminating selection bias is random assignment. If eligible families who apply to a pre-K program are randomly selected for participation, then we can reasonably expect that as the sample size increases, program participants and non-participants will on average be very close in all observed and unobserved characteristics.  Therefore, any differences in outcomes between participants and non-participants must reflect a true effect of the program, rather than unobserved pre-program differences.

However, it is infeasible to do random assignment evaluation of all pre-K programs. Random assignment experiments are difficult to run properly. They require soliciting excess applications for available slots, which is both logistically difficult and troublesome ethically. It is easy for program staff to make errors in random assignment, so it is better for random assignment to be run by an independent researcher.  For universally accessible pre-K, it would be ethically questionable to exclude some children from a program that is meant to be accessible to everyone.

A good substitute for random assignment in evaluating pre-K programs is what is called a regression discontinuity design. This approach has been used to evaluate pre-K programs in multiple states, and in Tulsa.   Regression discontinuity designs only allow an evaluation of the effects of pre-K programs on kindergarten readiness, as measured by kindergarten entrance test scores.  However, knowing the true effects of a pre-K program on kindergarten readiness is useful information. Such information can tell which programs and which program approaches are working in affecting kindergarten readiness. This information can be used over time to improve program effectiveness.

In the case of pre-K programs, implementing regression discontinuity requires giving the identical tests to pre-K program entrants, and to graduates from these same pre-K programs at kindergarten entrance, at the same time of the year.  With an age cutoff for kindergarten entrance, we will have child respondents who range in age over a two year period. In particular, there will be some individuals who are just old enough to enter kindergarten, and others who just missed the age cut-off for entering kindergarten and are entering pre-K.

With the continuous range of ages, we can control for the influence of age on test scores. With the age cut-off for entering kindergarten, we have an abrupt shift in whether children were enrolled in the pre-K program for a year or not. If the pre-K program is effective, we expect to see that while test scores will smoothly increase with age, they will take an abrupt jump as we go from children who just missed the age cut-off for being in pre-K the previous year to children who just made the cut-off for being in pre-K the previous year.  Because everyone in the sample participated in the pre-K program being examined, there is no selection bias.

This regression discontinuity approach relies on comparing test scores at pre-K entrance, and kindergarten entrance, to compute program effects, with the range of ages considered controlling for age. Pre-K program participants are in the control group when they are entering pre-K, and in the treatment group when entering kindergarten. We don’t have to do any complicated random assignment or matching or other analyses to create a control group.

However, this approach depends upon a somewhat unusual testing procedure. We have to use tests that are appropriate to administer to both pre-K entrants and kindergarten entrants, with the tests covering the wide range of abilities of children over that two-year age interval.  We have to administer the tests in the same way at the same time of year, even though it might be more customary to test program graduates at the end of the pre-K year rather than at the beginning of kindergarten.   This is not a testing approach that is likely to be adopted by accident.

I should mention that this approach is compatible with testing a wide variety of skills. For example, the administered tests could examine a child’s social skills, not just more academic skills. Nor does this approach necessarily involve over-testing children. The tests could be quite short, as long as they are given in the same form to both pre-K entrants and kindergarten entrants.

If we are to have continuous improvement in pre-K programs, we need good information on their effectiveness. This requires approaches that can separate the true effects of various pre-K programs from the effects of who is selected into the program.  Simply collecting good data on children, as the CAP report advocates, isn’t enough. Good data by itself will not identify and estimate pre-K programs’ true effects.  Program effectiveness analysis on a large scale requires the inclusion of some assessment data that are collected identically at both pre-K and kindergarten entrance.

About timbartik

Tim Bartik is a senior economist at the Upjohn Institute for Employment Research, a non-profit and non-partisan research organization in Kalamazoo, Michigan. His research specializes in state and local economic development policies and local labor markets.
This entry was posted in Early childhood program design issues, Early childhood programs. Bookmark the permalink.