Does early childhood education solve all problems? No, but it is a catalytic investment

David Brooks’s New York Times column of January 24, 2014 reflects a common misunderstanding about how to approach difficult policy issues. In discussing how to “expand opportunity for underprivileged children”, he says that we’ve made the following mistake:

“We’ve probably placed too much emphasis on early education. Don’t get me wrong. What happens in the early years is crucial. But human capital development takes a generation. If you really want to make an impact, you’ve got to have a developmental strategy for all the learning stages, ages 0 to 25.”

This is mostly wrong. Early childhood education investments by themselves, with no other change in public policy, make a “real impact”. Abecedarian/Educare child care/preschool from ages 0 to 5 is estimated to raise the future adult earnings of disadvantaged children by over 25%. Perry Preschool raised adult earnings by 19%. Estimates suggest that the adult earnings boost for disadvantaged children from the Chicago Child-Parent Center pre-K program is 8%, from Tulsa pre-K is 10%, and from Boston pre-K is 15%.

Therefore, we have not placed “too much emphasis on early education”. Rather, we have underinvested, because all these early childhood programs have future benefits, not only in increased adult earnings, but in lower crime, lower remedial education costs, and lower social welfare costs, that are many multiples of their investment costs.

It would cost around $30 billion in additional annual spending to offer voluntary universal pre-K to all 4-year olds. It would cost around $60 billion in additional spending to offer all children in poverty under the age of 5 with high-quality child-care and preschool from birth to age 5. In contrast, the latest appropriations bill provides about $1.4 billion in additional annual investments in early care and education. $1.4 billion is clearly a step forward, but in a country the size of the U.S., and given the evidence of the benefits of expanded early investments, it is nowhere near as large as is justified by the empirical evidence.

It is true that full investment in early childhood programs would not solve all problems of poverty and income inequality.  If we interpret Mr. Brooks as just saying that we need to consider a wide variety of policies to tackle the challenges of childhood poverty, how could anyone quarrel with that? We need to consider how to improve K-12 schools, how to improve parenting, how to create more jobs and more income for lower income parents, and so on.

But the art of public policy analysis is finding some way out of the sense of futility that occurs if we believe that no difficult problem can be significantly addressed without doing everything at once. If only comprehensive strategies will make a real difference, then in the real world, complex social problems have no real solution, because policymakers are seldom likely to in one fell swoop adopt a comprehensive strategy.

What we need to find are solutions that play a key catalytic role, which by themselves will have a high bang for the buck, a high benefit-cost ratio. Early childhood education is one such catalytic investment. We then can proceed incrementally, building on success to add other programs and strategies that can make a difference. That incremental strategy of going from one catalytic investment to another is something that policymakers can do, and that has real hope as a political and social strategy for success.

Posted in Early childhood program design issues, Early childhood programs | Comments Off on Does early childhood education solve all problems? No, but it is a catalytic investment

Public radio broadcast about Jean Jennings Bartik and the other ENIAC programmers

WMUK, the local public radio station in Kalamazoo, did a radio show on January 15, 2014 about my mother, Jean Jennings Bartik (1924-2011). The show was prompted by the recent publication of her memoirs, Pioneer Programmer: Jean Jennings Bartik and the Computer that Changed the World.  The station has posted a short description of the show’s topic, and a link to about a 17 minute broadcast.  The show consisted of an interview with me, and with Kathryn Kleiman of the Eniac Programmers Project, who is nearing completion of a documentary on the ENIAC programmers.

My mother was one of the first six computer programmers, all women, on the ENIAC, which was the computer that in 1946 directly led to the development of the modern computer industry. Her story was ignored for many years, and is still unknown by many. I know that one of my mother’s main motivations in writing the book was so that the story of the ENIAC programmers might help increase the representation of women in computing and other science and technology fields. Role models matter.

My mother’s book is available at Truman State University Press, Amazon, Barnes and Noble Nook, Apple iBook, and a Kalamazoo bookstore, Bookbug. All proceeds from the book will go to support a scholarship for women in science and technology at my mother’s alma mater, Northwest Missouri State University.

Posted in Uncategorized | Comments Off on Public radio broadcast about Jean Jennings Bartik and the other ENIAC programmers

The case for pre-K depends not just on empirical details of studies, but on what you view as plausible given what we know about child development, and on how urgently you view the problem of inequality versus the problem of taxes and deficits

The Cato Institute, a well-known libertarian think tank, sponsored a discussion of research on pre-K on January 7, 2014.  I watched a live stream of the event. The discussion featured George Mason professor David Armor, Brookings Institution researcher Russ Whitehurst, and Georgetown professors Deborah Phillips and Bill Gormley.  Conor Williams provided some coverage of this event.

The discussion was really in part a debate, with Armor and Whitehurst arguing that the research evidence is insufficient to support widespread expansion of pre-K programs, and Phillips and Gormley arguing that the research supported the effectiveness of high-quality pre-K in affecting the life course of disadvantaged children.  The arguments included the following: Armor emphasized what he sees as the deficiencies of the regression discontinuity studies, an argument I have previously discussed. Whitehurst emphasized the statistical insignificance of the Head Start random assignment results. Phillips emphasized what we know about child development in early childhood, and the statistical consensus of pre-K’s effectiveness summarized by the recent report by a group headed by Yoshikawa and Weiland, and which also included Gormley and Phillips. Gormley argued that the regression discontinuity studies were valid, because there is no sign of bias due to attrition in that the treatment and comparison groups are similar in observable variables.

What occurred to me is that the debate over expanding pre-K is in part a philosophical debate, not one that hinges solely on the details of empirical studies. Whitehurst at some point made the statement that the Head Start random assignment experiment showed “no sustained impacts”. Later on, he states that after the pre-K year, there were “no effects”. (Whitehurst previously said something similar in a Brookings post last January:  “There is no measurable advantage to children in elementary school of having participated in Head Start….  Head Start does not improve the school readiness of children from low-income families.”)

That’s not exactly what the Head Start random assignment study shows. What it shows is that the point estimate of the effect of Head Start on cognitive skills is statistically insignificantly different from zero as of third grade.  The point estimates of the effects of Head Start in this study decline by over 70% from the end of Head Start to the end of third grade. The resulting point estimate at third grade would predict that Head Start would improve future earnings by a little over 1%, which is not a trivial amount of money over a lifetime. But we cannot statistically reject the possibility that the true effect might be zero. We also cannot statistically reject the possibility that the true effect might be 2 or 3 times as large.  (Note to wonks: this is using data from the Head Start final impact report to calculate average effect sizes of 0.22 and 0.06 at the end of Head Start and the end of 3rd grade on the PPVT, the WJ III Letter-Word ID test, and the WJ III Math Applied Problems test. These test score effects are then used in conjunction with estimates by Chetty et al. of how test scores affect adult earnings.)

In addition, past studies of early childhood programs, including Perry, the Chicago Child-Parent Study, kindergarten class quality (Chetty et al.), and Head Start itself suggest that test score effects of these interventions often fade, but then the programs still have larger effects on adult outcomes than would be predicted by these faded test score effects. It is certainly possible that this will occur in the Head Start experiment.

Therefore, the Head Start random assignment experiment study is hardly strong evidence in favor of the effectiveness of Head Start, as it was in 2002, in improving third grade outcomes.  On the other hand, given that there is much other empirical evidence in favor of pre-K programs, and even Head Start, and given the statistical uncertainty in these Head Start results, the Head Start random assignment experiment is not strong evidence against the effectiveness of all publicly-funded pre-K.

But how do we interpret these results? One possibility is that we have a strong prior belief that the effect of pre-K programs is zero, either because we are generally skeptical of government intervention, or because we don’t think that academic intervention at age 4 makes sense from a child development standpoint. In addition, perhaps we are concerned about the danger of wasting money on a pre-K program that doesn’t work, which will drive up either deficits or taxes, which might be viewed as adding to our fiscal problems.

Another possibility is that we believe, based on the child development literature, that it is plausible that more time in educational programs at age 4 can make a difference.  We find that there is research evidence from a number of studies that supports that plausible hypothesis, even with test score effects fading. In addition, perhaps we are concerned about the dangers of NOT expanding pre-K funding. Income inequality is a pressing problem that is difficult to address. Developing human capital seems like a key way to address income inequality. Adding more early learning time is a straightforward policy that addresses human capital development. We know how to add high-quality early learning time, and we have done so in a number of state and local areas.  Failure to do so may have a large opportunity cost.

In other words, is the greater danger from expanding a pre-K program that doesn’t work? Or is the greater danger from not expanding pre-K programs that could make a major difference to many children’s future?

There will always be some policy uncertainties. It is more difficult to precisely estimate long-run effects of programs than short-run effects, and more difficult to precisely estimate aggregate effects of programs than effects on a specific group of individuals. Random assignment experiments will always be scarce because they are difficult and expensive to run.

How we resolve policy uncertainty is a choice. That choice is based in part not on the empirical evidence, but on our prior beliefs about child development, government intervention, and the relative dangers of excessive government spending versus increased income inequality.

One way to reduce the risk of doing the wrong thing is to expand pre-K, but to do so in a way that maximizes the probability that the intervention is high quality. This suggests that we should err on the side of spending more per child, and that we should be doing a great deal of monitoring of quality and results in pre-K programs.

Posted in Early childhood program design issues, Early childhood programs | 2 Comments

The reliability of estimates of effects of state and local pre-K programs on kindergarten test scores

A recent article on pre-K that has gained some public attention (for example, in columns by Mona Charen and Reihan Salam) is “The Dubious Promise of Universal Preschool”, by George Mason professors David Armor and Sonia Sousa, published in the Winter 2014 edition of the journal National Affairs.

Professors Armor and Sousa argue against universal preschool principally on two grounds: the random assignment study of Head Start is argued to show that test score effects of Head Start quickly fade out; current studies that estimate large effects of many state and local pre-K programs on kindergarten test scores may be biased upwards.

On the first issue, I have addressed the controversy over Head Start in previous blog posts. In brief, other good studies of Head Start do suggest long-run effects on adult outcomes of Head Start, even after many test score impacts fade. These long-run effects of Head Start may reflect effects of Head Start on skills that are not well-measured by standardized tests. In addition, a recent article by Steve Barnett of the National Institute for Early Education Research argues that recent efforts to reform Head Start may be increasing the program’s impacts on early literacy compared to its impacts during the period considered by the random assignment study.

On the second issue, I think Professors Armor and Sousa overstate their critique of recent studies of state and local pre-K programs, and that in fact these studies provide good evidence of significant effects of these programs in improving kindergarten entrance test scores. I am hardly a disinterested observer, as I co-authored one of the class of studies they critique. But I think there is good evidence supporting the validity and reliability of these recent studies.

The studies that Armor and Sousa criticize use a “regression discontinuity” methodology. The basic idea is to administer the same tests at the same time of the year to two groups of students: students just entering the pre-K program being evaluated; students just entering kindergarten who have completed a year in the pre-K program being evaluated. Because both groups of students are from families that chose to participate in pre-k, the two groups should be similar in both observed and unobserved characteristics.

There is one obvious difference between the two groups: the group entering kindergarten is on average a year older, and therefore would be expected to have higher test scores simply due to being older, even without pre-K. But we can statistically control for the effects of age on test scores. We anticipate that age by itself will have a smooth effect on test scores.  In most state and local areas, entrance into kindergarten and public pre-K programs is based on an age cut-off. In the sample, there are students in the entering pre-K group who just missed the age cut-off for entering pre-K the previous year, and there are students in the entering kindergarten group who just made the age cut-off for entering pre-K the previous year and entering kindergarten this year. We expect to see a smooth increase in test scores with age within the entering pre-K group, and within the entering kindergarten group, with a “jump” in test scores at the age cut-off if pre-k has a significant effect on test scores.

Intuitively, we are comparing students who are just a few days apart in age, and therefore are almost the same, and observing whether being in pre-K a year has increased test scores.  However, from a statistical perspective, we use a broader sample to better identify the effects of age versus the effects of pre-K on test scores.

As Professors Armor and Sousa acknowledge, these “regression discontinuity” studies of pre-K have found large effects of pre-K on test scores. These large effects occur in my study with Gormley and Adelstein of Tulsa pre-K, in previous studies of Tulsa pre-K,  in a study I did of Kalamazoo pre-K, and in studies of pre-K programs in Boston, Oklahoma, Michigan, New Jersey, West Virginia, South Carolina, Arkansas, and New Mexico.  These large effects are sufficient to predict sizable effects of pre-K on the adult earnings of former child participants, with earnings effects of up to 10% (Tulsa) or 15% (Boston). The ratio of the present value of these earnings effects to the costs of these pre-K programs is over 2 to 1 for middle-class children, and over 3 to 1 for low-income children.

Professors Armor and Sousa’s main critique of these regression discontinuity studies is that these studies may be biased by sample attrition. Specifically, the group entering kindergarten does not include students who left the school district, and this group might tend to have lower test scores, on average.

Sample attrition is a concern for any evaluation study. For example, it can be a major problem in random assignment studies, as I pointed out in a previous blog post on the recent random assignment evaluation of Tennessee’s pre-K program.

However, for several reasons, I do not think that the existing regression discontinuity pre-K studies are seriously biased by attrition.  First, if there were serious biases by attrition, one would expect them to be reflected in differences at the age cut-off in observable variables other than test scores between the comparison group and treatment group. But these regression discontinuity studies test for such jumps in observable variables at the age cutoff, and do not find such differences.

Second, in their study of Boston’s pre-K program, Professors Weiland and Yoshikawa adjust for sample attrition by reweighting their tested sample so that the reweighted sample has observable characteristics that resemble the original population, before attrition. (They can do this because they have information on the demographic and socioeconomic characteristics of students who entered the pre-K program, but who did not end up entering kindergarten in Boston Public Schools.) They find that this reweighting makes little difference to the estimated effects of pre-K.

Both of these procedures rely on observable variables. One could still argue that even though attrition does not lead to biases due to observable variables differing at the cutoff between the comparison and treatment groups, it might lead to differences in unobservable variables between the two groups that could be correlated with test scores.

However, in my study of Kalamazoo’s pre-K program, I re-estimated a regression discontinuity model using only the same students, that is using data on pre-K entrant scores from one fall combined with data on kindergarten entrant scores from the next fall, and only including students who were observed in both years.  In this model, there cannot be any differences in pre-existing observable and unobservable variables between the comparison and treatment groups because these are the same students observed at two different times. I found that the estimated test score effects from such a “panel data” model differed little from a more traditional regression discontinuity model, in which we compare all pre-K entrants with only the kindergarten entrants who did not attrit from the sample.

For all these reasons, although I certainly think sample attrition is an important issue that might conceivably bias the regression discontinuity pre-K studies, it does not appear that in practice attrition leads to any significant biases. Therefore, these regression discontinuity studies provide good evidence that pre-K increases kindergarten test scores by a sizable amount, which is likely to lead to large subsequent benefits for participants and for society.

Posted in Early childhood program design issues, Early childhood programs | 3 Comments

Published Duke study of North Carolina early childhood programs finds good evidence for spillover benefits for overall student achievement

The recently-published version of a Duke University study provides good evidence that high-quality early childhood programs have sizable spillover benefits for overall student achievement.  The research also suggests that these programs can have ratios of earnings benefits to costs that exceed 10 to 1. The study, by Duke professors Helen Ladd, Clara Muschkin, and Kenneth Dodge, looks at North Carolina’s Smart Start child care program and More at Four pre-K program. I previously did a blog post on an early version of this research.

Smart Start is a state of North Carolina program, begun in 1993, that provides state aid to county partnerships (or in some cases, multi-county partnerships) that supported a wide variety of initiatives to improve the quality of early childhood services. With encouragement from state regulations, most funds were devoted to child care services, both to fund child care subsidies and improve child care quality.

More at Four was a state of North Carolina program from 2001 to 2011 that supported full-day pre-K programs at age 4. It has since been renamed NC Pre-K.

The Duke research study uses a clever methodology that provides good evidence of causal effects of the program on student achievement. The research relies on the accidents of history and politics. Both the Smart Start program and the More at Four program were gradually rolled out geographically to different counties. In addition, funding levels have varied over time both statewide and by county. The research examines whether the level of Smart Start spending or More at Four spending in a child’s birth county during the relevant years (zero to 4 for Smart Start, age 4 for More at Four) help explain the child’s third grade test scores. The research also controls for other factors that might affect a child’s test scores, such as the education of the child’s mother, and demographic factors. Finally, the research controls for other county trends that might affect test scores.

The research finds sizable effects of spending on these programs in a child’s birth county on the child’s subsequent test scores.  Furthermore, the research finds that these effects are particularly strong for the at-risk children who are targeted by both programs. Test score effects are stronger for children whose mother does not have a high school degree.

The research implies extremely high ratios of earnings benefits to costs. The third grade test score effects can be used to predict future earnings effects. The research suggests that typical levels of Smart Start or More at Four spending as of 2009 would be sufficient to increase overall average future earnings, averaged over all four-year-olds in the county,   by 1.6% for Smart Start, and 2.9% for More at Four.

This might not sound like much. However, this increase is over an entire future career, from the first job until retirement.  Even if real wages do not increase in the future, average career earnings for Americans exceed $1.5 million, so even a 1% to 3% earnings boost is a huge amount of money.

Furthermore, this earnings boost is very large relative to costs. These simulations are for typical levels of funding as of 2009 for these two programs. This spending would average about $1,100 for More at Four for the typical 4-year old in the county – the funding per child actually in pre-K was much higher, but only 20% or so of children in most counties were in a state-funded pre-K program.  The funding per child for Smart Start was around $220 per year for 5 years, birth to age 4, or total spending that also is about $1,100 for the average 4-year old in the county.

To compare the future earnings boost to these cost figures, we need to adjust the future earnings figures for the time value of money – even if there were no inflation, we value a dollar today more than a dollar 20 years from now because we can save and invest a dollar today to earn interest, and thereby have more than a dollar in the future. If we adjust future dollars to their “present value” at age 4 using an appropriate interest rate, we conclude that Smart Start increased the “present value” of future earnings for all county 4-year olds by around $13,000.  The ratio of these earnings benefits to costs would be over 10 to 1 (11.7=$12,854 divided by $1,100).

Similarly, for More at Four, the program would be estimated to increase the “present value” of future earnings by about $24,000. This implies a ratio of earnings benefits to costs of over 20 to 1 (21.5=$23,690 divided by $1,100).

(Technical note: These calculations used estimates by Chetty et al. that imply that on average, each one percentile increase in 3rd grade test scores increases future earnings by 0.612%. I averaged the reading and math results from the Duke study to get average percentile boosts in test scores. The overall earnings figures are figures derived from the American Community Survey. I adjusted figures used in Bartik, Gormley, and Adelstein  for Tulsa to 2012 national prices.  I assumed that future real earnings grew by 1.2% per year, and adjusted future earnings to a present value at age 4 using a 3% real discount rate. The resulting present value of future earnings as of age 4 for an average child was $818,275.)

These estimates imply large spillover benefits of these programs for children other than those directly served by the programs. This is most clearly seen for the More at Four program, which only directly provides pre-K for about 20% of all children, with those services targeted on children from low-income families or with other risk factors.

As Ladd, Muschkin, and Dodge point out, their estimates of how student achievement effects of More at Four vary with demographics provide evidence of broad benefits for all demographic groups. Student achievement effects are stronger for children whose mother had less than a high school degree. But significant student achievement effects also occur for children of white non-Hispanic mothers with a high school degree or more.

Another way to see evidence for spillover effects is to look at the size of the earnings effects for all children even though only 20% of all children participated in More at Four programs. The 2.9% earnings boost is averaged over all 4-year-olds.  Because only 20% of all children participated, the percentage effects for those children would have to be at least 5 times as great. Furthermore, these at-risk children would tend to have lower lifetime earnings. Assume, based on results from Tulsa, that at-risk children in pre-K might tend to have only 68% of the lifetime earnings of the average person. Then if the More at Four numbers were solely due to effects on children participating in pre-K, the percentage effects on these children’s future earnings would have to exceed 21%.  (21.3%=2.9% times 5, divided by 0.68).  This seems much higher than we would expect from even a high-quality full-day pre-K program for one year.  For example, results in Tulsa imply that a full-day program for at-risk children might boost adult earnings by 9 to 10%.

The implication is that the remainder of the earnings boost is not due to effects on participating children, but rather are spillover benefits for other children. Where might these spillover benefits come from? As Ladd, Muschkin, and Dodge mention, spillover benefits might occur because “once children are in elementary school, all children might benefit from being in classrooms with higher proportions of children who come to school ready to learn and with less need for remediation”.  Children may learn from each other due to peer effects. With less need for remediation for participants in More at Four, teachers have more time to devote to learning for all children, including non-participants. Finally, if the pre-K program also improves behavior, fewer classroom disruptions will allow for increases in all students’ learning.

What is notable is the size of these spillover benefits. Based on the Tulsa results, the estimates imply that for each dollar increase in future earnings benefits for children participating in More at Four, there is another dollar of future earnings benefits for non-participating children. Ladd, Muschkin, and Dodge reach a similar conclusion based directly on the achievement test results, that estimated effects of More at Four are clearly over double what one would expect if there were no spillover benefits.

Spillover benefits are important because they help answer the question: why should I pay higher taxes for pre-K for other people’s children? One answer is because such tax financing is in my enlightened self-interest: it will benefit my child’s future, and the overall economic future of my community.

Given these results, it is unfortunate that the Smart Start program and North Carolina’s pre-K program have faced budget cuts in recent years.  Such budget cuts save funds for taxpayers in the short-run. However, in the long-run, investing in early childhood programs not only benefits participating children, but boosts overall education achievement for all students and boosts the entire state and national economy.

Posted in Early childhood programs, Economic development | 1 Comment

Final two short videos released on early childhood programs

Two more short videos on early childhood programs have been released, combining my words with videos and animation by Detroit Public TV.

One of these videos discusses short-term benefits of early childhood programs. These benefits include reducing remedial education costs, and helping the local economy and local property values by helping parents work and attracting parents with skills to the local economy.

The other video discusses how early childhood programs help support parenting. This includes  providing services that parents find it hard to duplicate on their own, such as helping children learn how to get along with peers and authority figures in a group setting. 

These two videos and the preceding four videos can all be downloaded for free from iTunes.

I greatly appreciate the talents of the staff of Detroit Public TV in developing these videos. I also thank Michigan’s Early Childhood Investment Corporation for providing financial support for these videos.

Posted in Early childhood programs, Timing of benefits | 1 Comment

Decisions on pre-K should be based on solid research evidence, not fragile case study evidence

Summary: A December 4 USA Today op-ed argues against expanding pre-K programs. The main argument is that Oklahoma test scores haven’t increased dramatically, even though the state has significantly increased pre-K access.

But a sample size of one state is an inadequate research basis for policy. Many trends in demographics, the economy, and K-12 education can cause large fluctuations in test scores.  Oklahoma’s test scores actually did rise slightly in the appropriate 4th grade tests that followed the greatest jump in preschool attendance. But further analysis of test score trends shows they could be interpreted as consistent with either large positive impacts, or zero impacts, of the state’s pre-K program – the many forces affecting state test scores create too much uncertainty for one state’s test score trends to provide precise enough estimates to be a useful guide for policymakers. The same is true for any other single factor that has been proven to be associated with educational improvements through high quality evaluations — would a state roll back a requirement that teachers have bachelor’s degrees or allow kindergarten classes to have 30 students in them, if test scores aren’t improving enough?

The more rigorous evidence on pre-K programs is found in studies that compare individual pre-K participants with similar individuals who do not participate in pre-K. Such studies hold constant the demographic, economic, and educational trends that can affect educational and economic success, and isolate the true cause and effect relationship between pre-K participation and success in life. These studies, in a variety of state and local programs around the U.S., have found strong evidence that quality pre-K programs can not only improve student test scores, but can increase later educational attainment and adult earnings.

A recent op-ed  in USA Today by Red Jahncke argued against expanding pre-K programs,  based on Mr. Jahncke’s interpretation of the experience of Oklahoma (“Get pre-K facts before investing billions”, December 4, 2013).

The intuitive argument of Mr. Jahncke is that Oklahoma has been one of the most aggressive states in expanding access to high-quality pre-K.  Therefore, why hasn’t Oklahoma become paradise on Earth? In particular, why haven’t test scores gone up more in Oklahoma?

This article appeals to a natural human intuition. We love anecdotes. We love case studies of individual people or places. We are often persuaded by the individual story, even when a more hard-headed statistical analysis would argue that the story doesn’t prove much of anything and is dominated by more solid research evidence.

I have already addressed in a previous post the “why isn’t Oklahoma paradise” issue, which was raised in a previous Wall Street Journal op-ed.  Here’s the short summary of my response:

  1. A sample size of one state is too small to really tell whether pre-K is having its expected effects on test scores. There’s too much else changing in individual states to reliably detect the expected effects of pre-K, as these other changing factors create a lot of noise, uncertainty, and volatility in individual state test scores. Individual state case studies provide weak research evidence relevant to any hypothesis for or against pre-K.
  2. More reliable evidence is provided by studies that compare the future life paths of children who participate in pre-K, versus similar children who do not participate in pre-K. These studies have much larger sample sizes and reliability. These studies show that high-quality pre-K programs can improve both short-run test scores, and long-run educational attainment and earnings.  These studies include not only the Perry Preschool Study, but also the various state and local pre-K studies, and in particular the Chicago Child-Parent Center study.

Mr. Jahncke argues that Oklahoma’s test scores have stagnated over the past ten years. Actually, if one looks at data from the National Assessment of Educational Progress (NAEP), 4th grade math scores in Oklahoma from 2003 to 2013 increased by 10 points, slightly faster than the national increase of 7 points.  Oklahoma’s 4th grade reading scores went up by 3 points, slightly less than the national increase of 4 points.

But even this is not a clean cut “natural experiment”. There have been many other big changes in both Oklahoma and the rest of the U.S. over this ten year time period. Pre-K access also increased in the U.S. over this time period. There’s too much noise to really tell whether pre-K in Oklahoma has made a difference.

As I argued in a past post, a little closer to a “natural experiment” is comparing test scores from 2003 to 2005. This was the time period when there was an abrupt jump in Oklahoma pre-K access for these 4th graders as of 5 years previously. The Oklahoma 4th graders who took the NAEP in 2003 were age 4 in 1997-98, when 5% of all Oklahoma 4-year olds were in state-funded pre-K. The Oklahoma 4th graders who took the NAEP in 2005 were age 4 in 1999-2000, when 51% of all Oklahoma 4-year-olds were in state-funded pre-K.

Over that time period, there was the most abrupt increase in Oklahoma pre-K enrollment of any 2-year period. Because these observations are only two years apart, this somewhat reduces the statistical noise from other factors changing.

Based on NIEER’s study of how much Oklahoma pre-K increases kindergarten test scores, we would expect Oklahoma pre-K’s expansion from 1997-98 to 1999-2000 to increase aggregate 4th grade test scores from 2003 to 2005 by a little less than 3 points on the NAEP. My previous post gives more details on this calculation.

Why isn’t the expected test score increase greater? First, the increase in enrollment is only 46% of all children, not 100%, which cuts the expected aggregate increase in half. Second, we know there is some fading of test score effects from kindergarten to 4th grade. This fading is observed in Perry Preschool, the Chicago Child Parent study, Head Start studies, and many other studies in which even with some test score fading, pre-K has strong effects in adulthood on educational attainment and earnings. These strong adult effects may be attributable to effects on “soft skills” (social skills) that are not measured well by standardized tests.

The trouble is that even over a two year period, the statistical uncertainty in how much Oklahoma’s test scores would go up is very great. This statistical uncertainty is probably plus or minus 6 points. Thus, although the pre-K enrollment increase from 1997-98 to 1999-2000 might be expected to increase Oklahoma’s 4th grade test scores relative to the nation from 2003 to 2005 by 3 NAEP points, this is 3 points plus or minus 6. So it would not be surprising for Oklahoma test scores to DECLINE relative to the nation over such a period by 3 points, or to go up by 9 points.

The actual test score change is that Oklahoma from 2003-2005 increased by about 2 points more than the nation in math, and about one-half point less than the nation in reading. This observed test score change is not statistically significantly different from the expected relative test score increase of 3 points. It is also not statistically significantly different from zero. We simply can’t tell. And we can’t tell because it is impossible to reliably detect an expected test score effect of 3 points when the statistical uncertainty in your case study estimation is plus or minus 6 points.

Why so much statistical uncertainty? In part, this is because the NAEP has a limited sample size in individual states, so there is some variation simply because a given year might happen to have a better or worse sample in a given state. But even more uncertainty is because a lot else changes in a state over even a short time period to change test scores, such as changes in socioeconomics and demographic composition, changes in the K-12 system, etc.

We can see this natural volatility in prior Oklahoma data. For example, even during time periods in which Oklahoma pre-K access did not change much, Oklahoma test scores have jumped by 5 or 6 points over short time periods. For example, from 2000 to 2003, Oklahoma’s 4th grade math test scores increased by 5 points, whereas from 1998 to 2002, 4th grade reading test scores dropped by 6 points.  These are time periods for which there was no significant prior change in pre-K enrollment for these cohorts, if we trace them back to what was going on in Oklahoma pre-K enrollment at age 4.

In other words, state test scores have so much natural volatility that it is very difficult to distinguish the signal from the noise in the test score trends of one individual state, even over short time periods, and even when pre-K enrollment significantly increased in that state over the relevant time period.

Now, one might argue that if one can’t see large test score increases for an individual state due to pre-K enrollment, it must be that these test score increases aren’t important. Not so. The NIEER study of Oklahoma suggests that Oklahoma pre-K increases test scores at kindergarten entrance by about 13 percentile points. Most parents would regard that as sizable. Studies by Chetty suggest such a test score increase would increase future adult earnings by about 7%. That seems like a sizable effect as well.  The present value of the future increase in earnings over the entire adult working career is about $20,000. This is for a pre-K program whose annual cost is less than $5,000 for a half-day program and less than $9,000 for a full-day program.  But even those test score increases, which are sizable at kindergarten entrance and associated with large adult earnings effects, would be hard to detect in aggregate test score data at 4th grade.

What people often fail to recognize are two things:

  1. Even quite modest test score gains due to pre-K at kindergarten entrance will predict very large adult earnings gains.
  2. Pre-K doesn’t produce a miracle in standardized test performance. The test score gains are there, but they do not eliminate all the testing problems of American students.

So, pre-K advocates should not overclaim what pre-K can do. Pre-K can produce improvements in life course that will produce adult earnings gains of 3 to 5 times the cost of these programs. But these problems do not by themselves solve all the problems of disadvantaged students. Many other policies must also be pursued to deal with difficult problems of income inequality and poverty.

But everyone should recognize how important even modest improvements in education and skills development can be to individuals and to the overall economy. It is worth spending significant funds if we can have even moderate effects on skills development in the U.S.

If case study evidence of one state is too volatile and uncertain, what can provide better evidence?  Better evidence is provided by the many studies I have mentioned above that compare individuals who participate in pre-K with similar individuals who don’t participate in pre-K.  These studies have several statistical advantages:

  1. They control for individual demographics and socioeconomics by comparing similar individuals
  2. They control for overall trends in the K-12 system and in society and the economy by comparing test scores, educational attainment, or earnings of individuals at the same point in time.
  3. These studies typically compare one treatment group of whom 100% or close to 100% participated in the pre-K program being studied, versus a comparison group in which few or none participated in that pre-K program, whereas aggregate studies of a state typically compare less extreme changes in pre-K participation.

Because these studies of individual outcomes have these better much better controls for demographics, socioeconomics, and social and educational trends, and have much larger independent variation in pre-K participation, they can provide more statistically precise and reliable estimates of the effects of pre-K programs.

We might also consider studies of states, but instead consider studies that include many states, not just one state versus the nation. For example, I have pointed out before that estimates suggest that variations across states in state pre-K enrollment appear to be statistically associated with increases in NAEP scores that are large enough to predict a high benefit-cost ratio for pre-K, and are of a similar size to estimates that compare individual pre-K participants versus non-participants.

The bottom line is that a case study of one individual state simply does not produce very precise statistical evidence for or against the effects of any social, educational or economic intervention.

But human beings love anecdotes about individuals, communities or states.  Regardless of our politics, we like to use an individual case study to support our prior beliefs about the way the world works. Both conservatives and liberals do this.

A recent example of the use of case study evidence is a New York Times’ opinion piece comparing Minnesota’s economy and politics to Wisconsin’s economy and politics. As the opinion piece points out, Minnesota’s economy has recently done better than Wisconsin’s economy. And Minnesota’s politics have been controlled more by Democrats, whereas Wisconsin’s have been controlled more by Republicans.

But is this good evidence in favor of Democratic policies to advance state economies over Republican policies? I would say No, it’s not strong evidence. There is simply too much going on that affects state economic performance for this comparison of two states, by itself, to tell us much about what state policies work to promote economic development. In statistical terms, the noise in short-term state economic trends dominates the plausible short-run effects of state government policies.

Now, if we had information on many more states , or comparisons of groups of businesses within the states differentially affected by state policies, then we might be able to reach some more reliable  conclusions on what works in state economic development policies. But just looking at one or two states’ aggregate performance is not a strong argument by itself.

People need to recognize that not all statistical data that is provided as “evidence” really produces precise or reliable information for or against a hypothesis about whether a particular policy or program is working.  Case studies of one or two individual states or communities rarely provide evidence that is statistically precise enough to prove or disprove any program’s effectiveness.  We need better studies with more controls for other factors, and more independent variation in program access. And we need to look at many such studies, not just a few studies.

Posted in Uncategorized | 9 Comments

Pre-K policy should be based on all the evidence, not one study of one state’s programs

Dr. Grover Whitehurst’s latest criticisms of Obama’s preschool plan at the Brown Center website at the Brookings Institution have drawn some attention. He has done numerous posts criticizing Obama’s preschool plan, some of which I’ve responded to in previous posts.

Dr. Whitehurst’s latest criticism is based on recent evidence from the Vanderbilt study of the Tennessee Voluntary Pre-K Program.  This study used a randomized control trial methodology. The results suggest that most of the academic and behavioral effects of Tennessee’s pre-K program had faded by the end of kindergarten and the end of first grade.

Dr. Whitehurst argues the following in his concluding paragraph:

 “I see these findings as devastating for advocates of the expansion of state pre-k programs.  This is the first large scale randomized trial of a present-day state pre-k program.  Its methodology soundly trumps the quasi-experimental approaches that have heretofore been the only source of data on which to infer the impact of these programs.  And its results align almost perfectly with those of the Head Start Impact Study, the only other large randomized trial that examines the longitudinal effects of having attended a public pre-k program.  Based on what we have learned from these studies, the most defensible conclusion is that these statewide programs are not working to meaningfully increase the academic achievement or social/emotional skills and dispositions of children from low-income families.  I wish this weren’t so, but facts are stubborn things.  Maybe we should figure out how to deliver effective programs before the federal government funds preschool for all.”

I have a number of detailed responses to this argument. But to sum up:

Incomplete findings from one good but imperfect study of one state’s quite imperfect pre-K program do not trump the many good studies of many pre-K programs that show that such programs can be effective, with the right resources and design. It is unwise for either opponents or proponents of expanded pre-K to over-react to one study; rather, decisions should be based on the overall weight of the research evidence.

What follows are more detailed responses:

1. I agree with Sara Mead ‘s comment at Education Week that one can hardly view the latest Tennessee Pre-K results as an argument in favor of pre-K. On the other hand, I also agree with her point that this one item of data does not trump all the other good evidence for pre-K effectiveness, from numerous studies.

2. Dr. Whitehurst is of the opinion that randomized control trials trump all other evidence by far. I disagree. Why do I disagree?  First, randomized control trials in practice are hard to run perfectly, which often limits their advantages compared to non-randomized studies.

Second, many non-randomized studies have good comparison groups, for example the Chicago Child-Parent Center studies compare similar neighborhoods with different pre-K access, some Head Start studies compare different siblings in the same family with different Head Start enrollment or different counties with different Head Start access due to federal policies, and the regression discontinuity studies of state pre-K compare kids with differential timing of pre-K access based on birth date. All of these studies with good comparison groups find some good evidence of pre-K’s effectiveness.

We don’t just throw all this info out because of one study of one program in one state. This would be true even if this Vanderbilt study had no issues and one thought that Tennessee’s program was the best program in the country.

3. In the particular case of the Tennessee evaluation, there were some problems with the randomized trial, particularly in Cohort 1 (2009-2010 pre-K participants), in that the parent consent rates were low and variable between Tennessee pre-K participants and non-participants. For example, in Cohort 1 of the study, the Tennessee study only had parental consent to look at the data for 46% of pre-K participants and 32% of non-participants. This improved in the second cohort (pre-K participants in 2010-2011) to 74% of participants and 68% of non-participants providing parental consent.  Dr. Whitehurst explicitly says that he focuses his attention on the evidence only in cases where more data was provided due to parental consent.

The problems with parental consent mean that for most of the comparisons, the actual children on whom data were collected no longer constitute a pure random assignment experiment, particularly in Cohort 1. In other words, it could well be that in Cohort 1, although the full treatment sample and full control sample might on average be similar in unobserved characteristics (e.g., parent motivation), as the initial assignment was determined randomly, this might not be at all true of the 46% of pre-K participants and 32% of non-participants for whom most of the data are available.  Parental consent may not be random with respect to unobserved characteristics of children and families.

The Vanderbilt researchers tried very hard to control for this problem, using appropriate methods. However, these methods, such as propensity score matching and statistical controls, are the same methods that people use WITHOUT random assignment data, and have the same issue — one can only control for variables one observes, not variables one does not observe. Furthermore, there are many modeling choices in dealing with these issues, and different modeling choices may yield different results.

4. It is of interest that for one of the variables, retention in kindergarten, for which info is available for the full sample, the pre-K program appears to cut retention in kindergarten from 8% to 4%. That is curious if there are really no end of kindergarten effects on achievement or behavior, which is what the data on the smaller sample suggests. Why would the retention rate be cut in half? Something must be going on to produce this result that we don’t observe for the smaller sample.

Furthermore, in the smaller sample for which parental consent WAS obtained, retention was only cut from 6% to 4% — which is a curious discrepancy between the smaller sample, on which Dr. Whitehurst bases his conclusions, and the full sample.

5. The retention differences mean that more of the weaker pre-K students get promoted to first-grade on time, which may be good for them, but which will tend to depress end of first grade scores in the treatment group relative to the control group.

6. Tennessee’s pre-K program appears to spend, according to NIEER, about $5,814 annually per child for a full-day program. Data from the Institute for Women’s Policy Research suggests that high-quality full-day pre-K might cost $9,000 or so annually per child.   Tennessee has a lower cost of living and lower teacher salaries, but there does seem to be some gap there.  NIEER estimated that Tennessee probably needs to spend at least $2,000 extra per child to consistently deliver quality.

7. As Steve Barnett of NIEER has pointed out, Tennessee’s program results at the end of the pre-K year were on the low end compared to some other state pre-K studies. Perhaps end of pre-K results are more likely to persist if the initial end of pre-K results are larger. Perhaps there is some critical size of effects that one needs to get at the end of pre-K before one can expect much persistence.

8. Sara Mead also raises the point that there may be effects of collective pre-K that differ from individual pre-K. That is, if one puts an entire class through pre-K, and combines this with the right K-3 policies, then teachers in K-3 can teach the entire class more effectively to a higher level. On the other hand, if we just put a few kids through pre-K, then teachers may find that they have to teach the same curriculum at the same pace to meet the needs of the overall class.  This result may tend to drag down any initial advantages for the pre-K kids, particularly if the initial advantages at kindergarten entrance are small.

It is of interest here that the Chicago Child-Parent Center study essentially was comparing kids in different neighborhoods that were similar in neighborhood characteristics except for whether they had the CPC program. Did CPC help allow subsequent classroom teaching to improve? Maybe.

9. One in general has to ask why studies sometimes find fade-out and that the control or comparison groups catch up to the treatment group. Some of it may be that all kids are experiencing the same curriculum, which will tend to over time reduce performance differences in the individual comparisons.  Another issue is that it is quite possible that teachers are intervening to provide extra help to kids who are behind. If there are initially more of such kids in the control group, then more kids in the control group will get such help.  But this is actually another benefit of pre-K — it may reduce the need for teachers to provide remedial help to the pre-K kids, and free up teacher time to do other things.

10. Having said all that: the latest Tennessee Pre-K results do not provide any strong evidence in favor of pre-K.  Maybe it is due to lack of full data on all survey respondents or limitations of Tennessee’s program or the lack of community effects in such a study, to reiterate the points mentioned above. It is hard to be sure without better data, ideally on the entire Tennessee sample, and more in-depth studies of what is going on in Tennessee, for example compared to Tulsa or New Jersey or Boston.

On the other hand, I don’t think the latest Tennessee results provide any strong evidence against the general consensus of the research literature, that many state and local pre-K programs are quite effective.

11. Is the implicit message from Dr. Whitehurst that a pre-K program for which we ONLY have evidence for effects at pre-K exit or kindergarten entrance is of no use? Does that really make sense? Is that really a tenable position? Is that the attitude of most middle-class parents — “We don’t care about whether our child is ready for kindergarten, because we’re sure that any initial advantages will fade.” This needs to be thought through. And one needs to think through why fade-out might occur and what it might mean.

12. Finally, what we really should be talking about is how we can replicate state and local pre-K programs that show much larger effects than in Tennessee, such as the programs in Tulsa, Boston, or New Jersey.

Posted in Early childhood program design issues, Early childhood programs, Local variation in benefits | 6 Comments

My one-page issue brief (with links!) on “Facts from Early Childhood Research”

At the request of the group ReadyNation, I prepared a one-page issue brief. This issue brief was designed to provide a brief review of the facts about early childhood programs for business leaders who are involved with ReadyNation.

The one-page issue brief is cut and pasted below. I hope it provides a useful summary. The issue brief includes links to previous posts and websites and articles/books that back up the statements that are made.

The below will fit on one page if you eliminate some of the line spaces and use Calibri 11-point font (and I’m sure other fonts as well, but that’s what I used).

Facts from Early Childhood Research

Returns are large

Large returns confirmed by recent studies of regular state and local agencies operating at large scale

Fade-out: short-term test scores better predictor of long-term earnings than long-term test scores

  • Perry had fade out of test score effects, but 19% earnings boost.
  • Deming : fade in Head Start test scores, but later outcomes predict 11% adult earnings boost.
  • Chetty found that short-term test score effects of kindergarten good predictor of adult earnings effects, but test effects at 3rd grade predict only one-sixth of actual future earnings effects.
  • Why? Importance of hard-to-measure “soft skills”.
  • Some fade-out may reflect controls getting remediation. Remediation savings is benefit of pre-K.

Head Start study

Costs affordable, and eventually self-financing

  • High-quality half-day pre-K is $5K/kid, full-day $9K, full-time child care and pre-K from birth to age 5 is $80K/kid, and parenting programs such as Nurse Family Partnership cost $10K/kid.
  • National costs at full-scale are $15 billion/yr. for  half-day pre-K for all 4-yr- olds, double for full-day or adding age 3.  Full-time child-care/pre-K from birth to age 5 for all disadvantaged kids would cost $40B/yr. NFP for all disadvantaged 1st-time moms costs $4B/yr.
  • Costs are modest per capita. Universal pre-K cost of $15B/yr is $50 per capita.
  • Fiscal benefits from higher tax revenues, lower welfare/special ed/crime costs result in fiscal benefits exceeding costs after 9 to 49 yrs., depending on assumptions.

Pre-K has broad benefits for working class and middle class kids as well as poor

  • Similar $ effects on earnings predicted for both middle-class & poor based on Tulsa/Boston.
  • Why? Quality pre-K hard for middle-class to afford on its own.
  • Targeting still requires big #s: just below half of children less than 5 are below 200% of poverty.

Child-care/parenting program benefits focused on poor kids, but parental benefits may be broader.

  • Why? Middle-class parents may be able to offer/afford such services on their own.
  • Subsidized child-care may have broad benefits for middle-class parents & economy.

State test score trends are consistent with positive benefits of broader access to quality pre-K

Posted in Distribution of benefits, Early childhood programs | 1 Comment

Pioneer Programmer: Jean Jennings Bartik and the Computer that Changed the World

Pioneer Programmer” is the autobiography of my mother, Jean Jennings Bartik (1924-2011). Truman State University published her autobiography on November 6.

The autobiography focuses on her stories of the early computer industry. My mother was one of the first six computer programmers on the ENIAC, the electronic computer that in 1946 got the computer industry started. The first six computer programmers were all women.  For many years, the “ENIAC women” were ignored in historical accounts, but this has changed in recent years.  For example, all the ENIAC women were inducted into the Women in Technology International Hall of Fame in 1997, my mother was made a Fellow of the Computer History Museum in 2008, and Steve Lohr of the New York Times wrote her obituary in 2011.

Why is this book of general interest? Because an important public policy issue is the under-representation of women in many areas of science and technology, including computers. There are many reasons for this under-representation, but at least one issue is the lack of female role models, as Catherine Rampell pointed out a few weeks ago in the New York Times magazine.

There are many good organizations working hard to increase women’s participation in computers and other technology fields: for example, the Anita Borg Institute, Women in Technology International, Women in Technology, the National Center for Women and Information Technology, and Girls Who Code. But we also need good dramatic stories. And that is what my mother wanted to provide – not only her own story, but also stories of the many fascinating personalities of the early computer industry, including the other women who were involved. She also recounts examples of some of the obstacles that women faced after World War II in making progress in computing, many of which continue today.

My mother wanted to provide stories that would encourage young women to consider science and technology fields as careers. She worked very hard to write this book so it would be interesting to a broad audience. I’m hardly an objective observer, but I think she succeeded. And others agree. Jennifer Light, a professor at Northwestern who has written about the role of women in the early computer industry, describes the book as follows:

 “A firsthand account of the history of American computing from one of the last human computers—who was also one of the first computer programmers—this book combines personal reflections and historical analysis in a lively narrative. Bartik gives readers a sense of the individuals and institutions who shaped computing in the twentieth century as well as her perspective on important issues such as continuing gender disparities in the field. The author’s personality sparkles throughout, and many photographs complement the text. This is a truly unique study and I highly recommend it.”

I have to second Professor Light’s comment that my mother’s personality comes through in the book. It is not an academic treatise. If you want a sense of her personality from a video, this six-minute video from the Computer History Museum mostly consists of an interview with my mother. She elaborates on and tells many more such stories in the book.

Book availability: from Truman State; from Amazon; Amazon Kindle; Barnes and Noble Nook. And if you request it, maybe your local public library.

Posted in Uncategorized | Comments Off on Pioneer Programmer: Jean Jennings Bartik and the Computer that Changed the World