Summary: A December 4 USA Today op-ed argues against expanding pre-K programs. The main argument is that Oklahoma test scores haven’t increased dramatically, even though the state has significantly increased pre-K access.
But a sample size of one state is an inadequate research basis for policy. Many trends in demographics, the economy, and K-12 education can cause large fluctuations in test scores. Oklahoma’s test scores actually did rise slightly in the appropriate 4th grade tests that followed the greatest jump in preschool attendance. But further analysis of test score trends shows they could be interpreted as consistent with either large positive impacts, or zero impacts, of the state’s pre-K program – the many forces affecting state test scores create too much uncertainty for one state’s test score trends to provide precise enough estimates to be a useful guide for policymakers. The same is true for any other single factor that has been proven to be associated with educational improvements through high quality evaluations — would a state roll back a requirement that teachers have bachelor’s degrees or allow kindergarten classes to have 30 students in them, if test scores aren’t improving enough?
The more rigorous evidence on pre-K programs is found in studies that compare individual pre-K participants with similar individuals who do not participate in pre-K. Such studies hold constant the demographic, economic, and educational trends that can affect educational and economic success, and isolate the true cause and effect relationship between pre-K participation and success in life. These studies, in a variety of state and local programs around the U.S., have found strong evidence that quality pre-K programs can not only improve student test scores, but can increase later educational attainment and adult earnings.
A recent op-ed in USA Today by Red Jahncke argued against expanding pre-K programs, based on Mr. Jahncke’s interpretation of the experience of Oklahoma (“Get pre-K facts before investing billions”, December 4, 2013).
The intuitive argument of Mr. Jahncke is that Oklahoma has been one of the most aggressive states in expanding access to high-quality pre-K. Therefore, why hasn’t Oklahoma become paradise on Earth? In particular, why haven’t test scores gone up more in Oklahoma?
This article appeals to a natural human intuition. We love anecdotes. We love case studies of individual people or places. We are often persuaded by the individual story, even when a more hard-headed statistical analysis would argue that the story doesn’t prove much of anything and is dominated by more solid research evidence.
I have already addressed in a previous post the “why isn’t Oklahoma paradise” issue, which was raised in a previous Wall Street Journal op-ed. Here’s the short summary of my response:
- A sample size of one state is too small to really tell whether pre-K is having its expected effects on test scores. There’s too much else changing in individual states to reliably detect the expected effects of pre-K, as these other changing factors create a lot of noise, uncertainty, and volatility in individual state test scores. Individual state case studies provide weak research evidence relevant to any hypothesis for or against pre-K.
- More reliable evidence is provided by studies that compare the future life paths of children who participate in pre-K, versus similar children who do not participate in pre-K. These studies have much larger sample sizes and reliability. These studies show that high-quality pre-K programs can improve both short-run test scores, and long-run educational attainment and earnings. These studies include not only the Perry Preschool Study, but also the various state and local pre-K studies, and in particular the Chicago Child-Parent Center study.
Mr. Jahncke argues that Oklahoma’s test scores have stagnated over the past ten years. Actually, if one looks at data from the National Assessment of Educational Progress (NAEP), 4th grade math scores in Oklahoma from 2003 to 2013 increased by 10 points, slightly faster than the national increase of 7 points. Oklahoma’s 4th grade reading scores went up by 3 points, slightly less than the national increase of 4 points.
But even this is not a clean cut “natural experiment”. There have been many other big changes in both Oklahoma and the rest of the U.S. over this ten year time period. Pre-K access also increased in the U.S. over this time period. There’s too much noise to really tell whether pre-K in Oklahoma has made a difference.
As I argued in a past post, a little closer to a “natural experiment” is comparing test scores from 2003 to 2005. This was the time period when there was an abrupt jump in Oklahoma pre-K access for these 4th graders as of 5 years previously. The Oklahoma 4th graders who took the NAEP in 2003 were age 4 in 1997-98, when 5% of all Oklahoma 4-year olds were in state-funded pre-K. The Oklahoma 4th graders who took the NAEP in 2005 were age 4 in 1999-2000, when 51% of all Oklahoma 4-year-olds were in state-funded pre-K.
Over that time period, there was the most abrupt increase in Oklahoma pre-K enrollment of any 2-year period. Because these observations are only two years apart, this somewhat reduces the statistical noise from other factors changing.
Based on NIEER’s study of how much Oklahoma pre-K increases kindergarten test scores, we would expect Oklahoma pre-K’s expansion from 1997-98 to 1999-2000 to increase aggregate 4th grade test scores from 2003 to 2005 by a little less than 3 points on the NAEP. My previous post gives more details on this calculation.
Why isn’t the expected test score increase greater? First, the increase in enrollment is only 46% of all children, not 100%, which cuts the expected aggregate increase in half. Second, we know there is some fading of test score effects from kindergarten to 4th grade. This fading is observed in Perry Preschool, the Chicago Child Parent study, Head Start studies, and many other studies in which even with some test score fading, pre-K has strong effects in adulthood on educational attainment and earnings. These strong adult effects may be attributable to effects on “soft skills” (social skills) that are not measured well by standardized tests.
The trouble is that even over a two year period, the statistical uncertainty in how much Oklahoma’s test scores would go up is very great. This statistical uncertainty is probably plus or minus 6 points. Thus, although the pre-K enrollment increase from 1997-98 to 1999-2000 might be expected to increase Oklahoma’s 4th grade test scores relative to the nation from 2003 to 2005 by 3 NAEP points, this is 3 points plus or minus 6. So it would not be surprising for Oklahoma test scores to DECLINE relative to the nation over such a period by 3 points, or to go up by 9 points.
The actual test score change is that Oklahoma from 2003-2005 increased by about 2 points more than the nation in math, and about one-half point less than the nation in reading. This observed test score change is not statistically significantly different from the expected relative test score increase of 3 points. It is also not statistically significantly different from zero. We simply can’t tell. And we can’t tell because it is impossible to reliably detect an expected test score effect of 3 points when the statistical uncertainty in your case study estimation is plus or minus 6 points.
Why so much statistical uncertainty? In part, this is because the NAEP has a limited sample size in individual states, so there is some variation simply because a given year might happen to have a better or worse sample in a given state. But even more uncertainty is because a lot else changes in a state over even a short time period to change test scores, such as changes in socioeconomics and demographic composition, changes in the K-12 system, etc.
We can see this natural volatility in prior Oklahoma data. For example, even during time periods in which Oklahoma pre-K access did not change much, Oklahoma test scores have jumped by 5 or 6 points over short time periods. For example, from 2000 to 2003, Oklahoma’s 4th grade math test scores increased by 5 points, whereas from 1998 to 2002, 4th grade reading test scores dropped by 6 points. These are time periods for which there was no significant prior change in pre-K enrollment for these cohorts, if we trace them back to what was going on in Oklahoma pre-K enrollment at age 4.
In other words, state test scores have so much natural volatility that it is very difficult to distinguish the signal from the noise in the test score trends of one individual state, even over short time periods, and even when pre-K enrollment significantly increased in that state over the relevant time period.
Now, one might argue that if one can’t see large test score increases for an individual state due to pre-K enrollment, it must be that these test score increases aren’t important. Not so. The NIEER study of Oklahoma suggests that Oklahoma pre-K increases test scores at kindergarten entrance by about 13 percentile points. Most parents would regard that as sizable. Studies by Chetty suggest such a test score increase would increase future adult earnings by about 7%. That seems like a sizable effect as well. The present value of the future increase in earnings over the entire adult working career is about $20,000. This is for a pre-K program whose annual cost is less than $5,000 for a half-day program and less than $9,000 for a full-day program. But even those test score increases, which are sizable at kindergarten entrance and associated with large adult earnings effects, would be hard to detect in aggregate test score data at 4th grade.
What people often fail to recognize are two things:
- Even quite modest test score gains due to pre-K at kindergarten entrance will predict very large adult earnings gains.
- Pre-K doesn’t produce a miracle in standardized test performance. The test score gains are there, but they do not eliminate all the testing problems of American students.
So, pre-K advocates should not overclaim what pre-K can do. Pre-K can produce improvements in life course that will produce adult earnings gains of 3 to 5 times the cost of these programs. But these problems do not by themselves solve all the problems of disadvantaged students. Many other policies must also be pursued to deal with difficult problems of income inequality and poverty.
But everyone should recognize how important even modest improvements in education and skills development can be to individuals and to the overall economy. It is worth spending significant funds if we can have even moderate effects on skills development in the U.S.
If case study evidence of one state is too volatile and uncertain, what can provide better evidence? Better evidence is provided by the many studies I have mentioned above that compare individuals who participate in pre-K with similar individuals who don’t participate in pre-K. These studies have several statistical advantages:
- They control for individual demographics and socioeconomics by comparing similar individuals
- They control for overall trends in the K-12 system and in society and the economy by comparing test scores, educational attainment, or earnings of individuals at the same point in time.
- These studies typically compare one treatment group of whom 100% or close to 100% participated in the pre-K program being studied, versus a comparison group in which few or none participated in that pre-K program, whereas aggregate studies of a state typically compare less extreme changes in pre-K participation.
Because these studies of individual outcomes have these better much better controls for demographics, socioeconomics, and social and educational trends, and have much larger independent variation in pre-K participation, they can provide more statistically precise and reliable estimates of the effects of pre-K programs.
We might also consider studies of states, but instead consider studies that include many states, not just one state versus the nation. For example, I have pointed out before that estimates suggest that variations across states in state pre-K enrollment appear to be statistically associated with increases in NAEP scores that are large enough to predict a high benefit-cost ratio for pre-K, and are of a similar size to estimates that compare individual pre-K participants versus non-participants.
The bottom line is that a case study of one individual state simply does not produce very precise statistical evidence for or against the effects of any social, educational or economic intervention.
But human beings love anecdotes about individuals, communities or states. Regardless of our politics, we like to use an individual case study to support our prior beliefs about the way the world works. Both conservatives and liberals do this.
A recent example of the use of case study evidence is a New York Times’ opinion piece comparing Minnesota’s economy and politics to Wisconsin’s economy and politics. As the opinion piece points out, Minnesota’s economy has recently done better than Wisconsin’s economy. And Minnesota’s politics have been controlled more by Democrats, whereas Wisconsin’s have been controlled more by Republicans.
But is this good evidence in favor of Democratic policies to advance state economies over Republican policies? I would say No, it’s not strong evidence. There is simply too much going on that affects state economic performance for this comparison of two states, by itself, to tell us much about what state policies work to promote economic development. In statistical terms, the noise in short-term state economic trends dominates the plausible short-run effects of state government policies.
Now, if we had information on many more states , or comparisons of groups of businesses within the states differentially affected by state policies, then we might be able to reach some more reliable conclusions on what works in state economic development policies. But just looking at one or two states’ aggregate performance is not a strong argument by itself.
People need to recognize that not all statistical data that is provided as “evidence” really produces precise or reliable information for or against a hypothesis about whether a particular policy or program is working. Case studies of one or two individual states or communities rarely provide evidence that is statistically precise enough to prove or disprove any program’s effectiveness. We need better studies with more controls for other factors, and more independent variation in program access. And we need to look at many such studies, not just a few studies.
If we had more and better pre-K programs, perhaps our future pundits would be less statistically illiterate. Just had to zinger that….
To be fair about this, I would say that it is a natural human tendency to over-generalize from samples of one. We all do it. I think even people with advanced statistics or econometrics do it. We evolved to be very quick to generalize as human beings, as the advantage went to the hunter or food-gatherer who was willing to learn from one experience.
My Compliments! In terms of fairness, you are more than two standard deviations above average! Thank you for the important work that you do.
On another note, interest in expanding pre-K programs using pay-for-performance contracts and social impact bonds is on the rise. Do you think this will lead to more thoroughgoing quantitative evaluations of pre-K effects? Most interest seems to be focused on how pre-K may lower special education assignment rates in grade school. Any thoughts?
Tim,
You reference Chetty’s STAR paper to say that programs that raise kids’ Kindergarten test scores have large financial implications in adulthood. I don’t think I read Chetty’s paper the way you do. I think there are two important observations from it.
First, higher test scores in K are correlated with higher income as an adult, but there is no evidence in this paper that doing something to raise this score is the cause of higher future income. Instead, it may just be the case that genetically smart kids have higher K test scores and also make more money in life.
Second, having a quality teacher, as defined by the test scores of other kids in the classroom, increases future earnings.
I believe you make the claim that because pre-K increases K test scores, and Chetty’s paper shows correlation between higher K test scores and earnings, pre-K leads to higher earnings. But correlations are not transitive. I’d guess there is correlation between kids who go to the Caribbean for the winter holidays in Kindergarten and their earnings as an adult. But if you then start a program whereby the school paid for everyone to spend a week in the Caribbean once, I doubt it would have an effect on lifetime earnings.
Just because evidence shows indicator B is correlated with outcome C, and you can start a program A that affects B, you can’t say that program A affects C. You have to test directly whether there is a link. But there is very little evidence of this.
Perry Preschool supposedly did this, but let’s not forget this program happened 50 years ago and had a tiny treatment arm. If you discount the Nashville and Oklahoma research, I don’t know how one can claim Perry is an important result. I’d argue that its relevance to whether preschool as it functions today affects future outcomes is minimal.
What evidence that doesn’t rely on a misuse of Chetty’s work is there that pre-K as delivered similar to its current form has long term effects on outcomes?
John:
1. As you know, we disagree totally about the implications of Chetty’s work. Chetty’s work shows that increases in a child’s test scores that are induced by higher test scores of other children in the same class, and that hence are external to the child’s characteristics (e.g., their genetics or their own personal family environment) are correlated with higher earnings. Furthermore, in this study, student’s assignment to classes is random, so it is not due to the child’s own unobserved characteristics that they end up in a class that has higher test scores. I think it is fair to interpret this connection as causal. It could be due to teacher quality, or it could be due to peer effects.
So, I think your examples of correlations that are due to the child’s own characteristics are not an appropriate interpretation of what Chetty finds. What Chetty finds is that if you are randomly assigned to a class in which your classmates get higher test scores, you get higher earnings as an adult. That is a causal effect, not an accident of correlation due to unobservable attributes of the individual.
2. The direct evidence on earnings effects is not only Perry. (Although I wouldn’t discount Perry; the fact that earnings effects are statistically significant with a small sample size is an argument in favor of the study, not against it.) Other direct evidence includes the Chicago Child Parent Center study, and to a lesser extent the Abecedarian study, and Deming’s study of adult outcomes of the Head Start program. I have discussed all of this extensively in previous posts on this blog.
3. The evidence on shorter-term effects of pre-K is more than Tennessee and Oklahoma (actually the Tennessee evidence is weak), but also Boston, West Virginia, Michigan, South Carolina, New Jersey, New Mexico, and Kalamazoo. Again, I have discussed all these topics in previous blog posts.
For those interested in seeing links to the evidence cited in this comment, and more discussion, the search function on this blog will find the relevant prior blog posts.
I think we’re saying the same thing on Chetty. I wrote : “Second, having a quality teacher, as defined by the test scores of other kids in the classroom, increases future earnings.”
But I think you are making a different point. Here is a quote from your study on Tulsa:
“Combining test-score data from the fall of 2006 and recent findings by Chetty et al. (forthcoming) on the relationship between kindergarten test scores and adult earnings, we generate plausible projections of adult earnings effects and a partial cost-benefit analysis of the Tulsa pre-K program.”
I think you claim that pre-K improves Kindergarten test scores and Kindergarten test scores are correlated with adult earnings, so therefore pre-K improves adult earnings. I do not believe you can make this claim.
I think the Chicago Child-Parent Center longitudinal study is the best evidence for a direct link between an early childhood intervention and lifetime earnings. However, this program was ages 3-9, included social services, parent training, home visitation, summer programs, etc., so may not be directly relevant to Head Start.
1. Chetty has both correlational analysis of kindergarten test scores’ effects on adult earnings, and causal estimates. The correlational estimates are that each 1 percentile boost in kindergarten test scores increases adult earnings by $131 per year at age 27. When demographic controls are added, this estimate dips to $94. When the estimates switch to relying on the random assignment to classes, the estimates dip to $79. This is the effect of being randomly assigned to a class for whom the rest of your classmates happen to have higher test scores which increase your test scores. Chetty uses and interprets these estimates as causal — he has no direct measure of teacher quality.
2. It is possible that other types of policies that increase kindergarten test scores will have different effects on adult earnings, due to effects on unobserved skills. But this factor is as likely to mean that Chetty’s estimate will lead to a DOWNWARD biased estimate of the effect of pre-K on adult earnings as an UPWARDS biased estimate.
3. In the Tulsa paper, we looked to see if this procedure would yield a biased estimate of the effects of pre-K on adult earnings, as judged by prior experimental studies of pre-K that looked at both effects at end of pre-K earnings and effects on adult earnings. We found a close match: effects on end of pre-K test scores were a good predictor of adult earnings using Chetty’s numbers to make the link. For example, for Perry Preschool, early test score effects would have predicted a 16% increase in adult earnings, and the actual experimental results were a 19% increase. For Chicago Child-Parent Center, end of pre-K test scores would have predicted an 8% effect on adult earnings, and what was actually observed was 7%. So using the Chetty estimates to link test score effects with adult earnings seems to result in good predictions.
4. I also note that other studies, for example Currie and Thomas, have similar estimates of how effects of a policy on early test scores predict adult earnings effects.
5. All the estimates I use for the Chicago Child Parent Center program, and many of the estimates that others use, are based ONLY on the pre-K component of the program. This program was pre-K at ages 3 and 4, and my estimates do not include any effects on the age 5 and up component of the program. In addition, the estimates for CPC suggest that the children who attended for both ages 3 and 4, although they received somewhat higher benefits, did not receive doubled benefits. So CPC actually provides fairly strong evidence that one year of half-day pre-K can work in increasing adult earnings.
My whole point is Chetty’s work does not provide the basis for the jump that you keep mentioning. I agree pre-K raises K test scores. I agree Chetty’s work shows correlation between K test scores and future earnings. I do not agree that you can make a logical deduction that pre-K increases future earnings from those two statements. The transitive property does not hold for correlation.
Separate from his correlation analysis on K grades and future income, Chetty makes another point that teacher quality has a causal link to future earnings. But this causal relationship is irrelevant to argument that pre-K leads to future earnings.
Causal links are transitive; correlation links are not. If A causes B and B causes C, then A causes C. If A causes B, and B is correlated with C, one cannot say that A causes C.
We’re going around in circles in this discussion. You believe that the relationships that Chetty has identified are just correlational, whereas I believe they are causal. My reasons for believing so are two-fold: (1) the relationships are identified via random assignment, and hold constant observed and unobserved individual characteristics; (2) when the Chetty relationships are used to predict pre-K adult earnings gains from pre-K test score effects, the predictions are good.
If these aren’t causal relationships, why does combining the Chetty coefficients with early test score impacts of pre-K give such good predictions of pre-K’s impacts on adult earnings?