Summary: A December 4 USA Today op-ed argues against expanding pre-K programs. The main argument is that Oklahoma test scores haven’t increased dramatically, even though the state has significantly increased pre-K access.
But a sample size of one state is an inadequate research basis for policy. Many trends in demographics, the economy, and K-12 education can cause large fluctuations in test scores. Oklahoma’s test scores actually did rise slightly in the appropriate 4th grade tests that followed the greatest jump in preschool attendance. But further analysis of test score trends shows they could be interpreted as consistent with either large positive impacts, or zero impacts, of the state’s pre-K program – the many forces affecting state test scores create too much uncertainty for one state’s test score trends to provide precise enough estimates to be a useful guide for policymakers. The same is true for any other single factor that has been proven to be associated with educational improvements through high quality evaluations — would a state roll back a requirement that teachers have bachelor’s degrees or allow kindergarten classes to have 30 students in them, if test scores aren’t improving enough?
The more rigorous evidence on pre-K programs is found in studies that compare individual pre-K participants with similar individuals who do not participate in pre-K. Such studies hold constant the demographic, economic, and educational trends that can affect educational and economic success, and isolate the true cause and effect relationship between pre-K participation and success in life. These studies, in a variety of state and local programs around the U.S., have found strong evidence that quality pre-K programs can not only improve student test scores, but can increase later educational attainment and adult earnings.
A recent op-ed in USA Today by Red Jahncke argued against expanding pre-K programs, based on Mr. Jahncke’s interpretation of the experience of Oklahoma (“Get pre-K facts before investing billions”, December 4, 2013).
The intuitive argument of Mr. Jahncke is that Oklahoma has been one of the most aggressive states in expanding access to high-quality pre-K. Therefore, why hasn’t Oklahoma become paradise on Earth? In particular, why haven’t test scores gone up more in Oklahoma?
This article appeals to a natural human intuition. We love anecdotes. We love case studies of individual people or places. We are often persuaded by the individual story, even when a more hard-headed statistical analysis would argue that the story doesn’t prove much of anything and is dominated by more solid research evidence.
I have already addressed in a previous post the “why isn’t Oklahoma paradise” issue, which was raised in a previous Wall Street Journal op-ed. Here’s the short summary of my response:
- A sample size of one state is too small to really tell whether pre-K is having its expected effects on test scores. There’s too much else changing in individual states to reliably detect the expected effects of pre-K, as these other changing factors create a lot of noise, uncertainty, and volatility in individual state test scores. Individual state case studies provide weak research evidence relevant to any hypothesis for or against pre-K.
- More reliable evidence is provided by studies that compare the future life paths of children who participate in pre-K, versus similar children who do not participate in pre-K. These studies have much larger sample sizes and reliability. These studies show that high-quality pre-K programs can improve both short-run test scores, and long-run educational attainment and earnings. These studies include not only the Perry Preschool Study, but also the various state and local pre-K studies, and in particular the Chicago Child-Parent Center study.
Mr. Jahncke argues that Oklahoma’s test scores have stagnated over the past ten years. Actually, if one looks at data from the National Assessment of Educational Progress (NAEP), 4th grade math scores in Oklahoma from 2003 to 2013 increased by 10 points, slightly faster than the national increase of 7 points. Oklahoma’s 4th grade reading scores went up by 3 points, slightly less than the national increase of 4 points.
But even this is not a clean cut “natural experiment”. There have been many other big changes in both Oklahoma and the rest of the U.S. over this ten year time period. Pre-K access also increased in the U.S. over this time period. There’s too much noise to really tell whether pre-K in Oklahoma has made a difference.
As I argued in a past post, a little closer to a “natural experiment” is comparing test scores from 2003 to 2005. This was the time period when there was an abrupt jump in Oklahoma pre-K access for these 4th graders as of 5 years previously. The Oklahoma 4th graders who took the NAEP in 2003 were age 4 in 1997-98, when 5% of all Oklahoma 4-year olds were in state-funded pre-K. The Oklahoma 4th graders who took the NAEP in 2005 were age 4 in 1999-2000, when 51% of all Oklahoma 4-year-olds were in state-funded pre-K.
Over that time period, there was the most abrupt increase in Oklahoma pre-K enrollment of any 2-year period. Because these observations are only two years apart, this somewhat reduces the statistical noise from other factors changing.
Based on NIEER’s study of how much Oklahoma pre-K increases kindergarten test scores, we would expect Oklahoma pre-K’s expansion from 1997-98 to 1999-2000 to increase aggregate 4th grade test scores from 2003 to 2005 by a little less than 3 points on the NAEP. My previous post gives more details on this calculation.
Why isn’t the expected test score increase greater? First, the increase in enrollment is only 46% of all children, not 100%, which cuts the expected aggregate increase in half. Second, we know there is some fading of test score effects from kindergarten to 4th grade. This fading is observed in Perry Preschool, the Chicago Child Parent study, Head Start studies, and many other studies in which even with some test score fading, pre-K has strong effects in adulthood on educational attainment and earnings. These strong adult effects may be attributable to effects on “soft skills” (social skills) that are not measured well by standardized tests.
The trouble is that even over a two year period, the statistical uncertainty in how much Oklahoma’s test scores would go up is very great. This statistical uncertainty is probably plus or minus 6 points. Thus, although the pre-K enrollment increase from 1997-98 to 1999-2000 might be expected to increase Oklahoma’s 4th grade test scores relative to the nation from 2003 to 2005 by 3 NAEP points, this is 3 points plus or minus 6. So it would not be surprising for Oklahoma test scores to DECLINE relative to the nation over such a period by 3 points, or to go up by 9 points.
The actual test score change is that Oklahoma from 2003-2005 increased by about 2 points more than the nation in math, and about one-half point less than the nation in reading. This observed test score change is not statistically significantly different from the expected relative test score increase of 3 points. It is also not statistically significantly different from zero. We simply can’t tell. And we can’t tell because it is impossible to reliably detect an expected test score effect of 3 points when the statistical uncertainty in your case study estimation is plus or minus 6 points.
Why so much statistical uncertainty? In part, this is because the NAEP has a limited sample size in individual states, so there is some variation simply because a given year might happen to have a better or worse sample in a given state. But even more uncertainty is because a lot else changes in a state over even a short time period to change test scores, such as changes in socioeconomics and demographic composition, changes in the K-12 system, etc.
We can see this natural volatility in prior Oklahoma data. For example, even during time periods in which Oklahoma pre-K access did not change much, Oklahoma test scores have jumped by 5 or 6 points over short time periods. For example, from 2000 to 2003, Oklahoma’s 4th grade math test scores increased by 5 points, whereas from 1998 to 2002, 4th grade reading test scores dropped by 6 points. These are time periods for which there was no significant prior change in pre-K enrollment for these cohorts, if we trace them back to what was going on in Oklahoma pre-K enrollment at age 4.
In other words, state test scores have so much natural volatility that it is very difficult to distinguish the signal from the noise in the test score trends of one individual state, even over short time periods, and even when pre-K enrollment significantly increased in that state over the relevant time period.
Now, one might argue that if one can’t see large test score increases for an individual state due to pre-K enrollment, it must be that these test score increases aren’t important. Not so. The NIEER study of Oklahoma suggests that Oklahoma pre-K increases test scores at kindergarten entrance by about 13 percentile points. Most parents would regard that as sizable. Studies by Chetty suggest such a test score increase would increase future adult earnings by about 7%. That seems like a sizable effect as well. The present value of the future increase in earnings over the entire adult working career is about $20,000. This is for a pre-K program whose annual cost is less than $5,000 for a half-day program and less than $9,000 for a full-day program. But even those test score increases, which are sizable at kindergarten entrance and associated with large adult earnings effects, would be hard to detect in aggregate test score data at 4th grade.
What people often fail to recognize are two things:
- Even quite modest test score gains due to pre-K at kindergarten entrance will predict very large adult earnings gains.
- Pre-K doesn’t produce a miracle in standardized test performance. The test score gains are there, but they do not eliminate all the testing problems of American students.
So, pre-K advocates should not overclaim what pre-K can do. Pre-K can produce improvements in life course that will produce adult earnings gains of 3 to 5 times the cost of these programs. But these problems do not by themselves solve all the problems of disadvantaged students. Many other policies must also be pursued to deal with difficult problems of income inequality and poverty.
But everyone should recognize how important even modest improvements in education and skills development can be to individuals and to the overall economy. It is worth spending significant funds if we can have even moderate effects on skills development in the U.S.
If case study evidence of one state is too volatile and uncertain, what can provide better evidence? Better evidence is provided by the many studies I have mentioned above that compare individuals who participate in pre-K with similar individuals who don’t participate in pre-K. These studies have several statistical advantages:
- They control for individual demographics and socioeconomics by comparing similar individuals
- They control for overall trends in the K-12 system and in society and the economy by comparing test scores, educational attainment, or earnings of individuals at the same point in time.
- These studies typically compare one treatment group of whom 100% or close to 100% participated in the pre-K program being studied, versus a comparison group in which few or none participated in that pre-K program, whereas aggregate studies of a state typically compare less extreme changes in pre-K participation.
Because these studies of individual outcomes have these better much better controls for demographics, socioeconomics, and social and educational trends, and have much larger independent variation in pre-K participation, they can provide more statistically precise and reliable estimates of the effects of pre-K programs.
We might also consider studies of states, but instead consider studies that include many states, not just one state versus the nation. For example, I have pointed out before that estimates suggest that variations across states in state pre-K enrollment appear to be statistically associated with increases in NAEP scores that are large enough to predict a high benefit-cost ratio for pre-K, and are of a similar size to estimates that compare individual pre-K participants versus non-participants.
The bottom line is that a case study of one individual state simply does not produce very precise statistical evidence for or against the effects of any social, educational or economic intervention.
But human beings love anecdotes about individuals, communities or states. Regardless of our politics, we like to use an individual case study to support our prior beliefs about the way the world works. Both conservatives and liberals do this.
A recent example of the use of case study evidence is a New York Times’ opinion piece comparing Minnesota’s economy and politics to Wisconsin’s economy and politics. As the opinion piece points out, Minnesota’s economy has recently done better than Wisconsin’s economy. And Minnesota’s politics have been controlled more by Democrats, whereas Wisconsin’s have been controlled more by Republicans.
But is this good evidence in favor of Democratic policies to advance state economies over Republican policies? I would say No, it’s not strong evidence. There is simply too much going on that affects state economic performance for this comparison of two states, by itself, to tell us much about what state policies work to promote economic development. In statistical terms, the noise in short-term state economic trends dominates the plausible short-run effects of state government policies.
Now, if we had information on many more states , or comparisons of groups of businesses within the states differentially affected by state policies, then we might be able to reach some more reliable conclusions on what works in state economic development policies. But just looking at one or two states’ aggregate performance is not a strong argument by itself.
People need to recognize that not all statistical data that is provided as “evidence” really produces precise or reliable information for or against a hypothesis about whether a particular policy or program is working. Case studies of one or two individual states or communities rarely provide evidence that is statistically precise enough to prove or disprove any program’s effectiveness. We need better studies with more controls for other factors, and more independent variation in program access. And we need to look at many such studies, not just a few studies.