Ezra Klein had an interesting column recently that argued for stronger evaluation as the key to the government making smart public investments while avoiding excessive deficits. The argument is that rigorous evaluation of government programs is simple common sense, equivalent to a dieter using a scale to measure progress in losing weight.
As an economist, I’ve often put on my policy wonk hat over the years and advocated for more and better evaluation of government programs. Perhaps this could be argued to be self-interest. We economists are considered to have some expertise in evaluation, and so arguing for evaluation could be seen as a way of drumming up business.
But there really is a common sense argument for evaluation. If we really want more and effective government policies, surely we should want to know how current policies are working, and what program variations work the best.
However, I sometimes wonder whether the U.S. has the right political system to appropriately use the results of evaluation. Evaluation results logically should be used to improve public programs. If a program evaluation is negative, this does not necessarily mean that the program should be eliminated. Rather, if the goal of the program is good, we should use the evaluation results to explore what might work better to achieve that policy goal.
In the U.S., that is often not the way evaluation results are used. In the U.S. context, any negative evaluation results are often used to attempt to kill the program rather than reform the program.
In my opinion, this occurs because for many government programs, there is no overall consensus that the goals being pursued are legitimate for government to pursue. Therefore, philosophical opponents of the programs are never persuaded by positive evidence. Any slightly negative evaluation approaches are mainly used by opponents as ammunition to attempt to kill the program. In reaction, supporters of the program resist evaluation, or seek to overinterpret positive results as showing more than they actually show.
Let’s consider an example of this problem with respect to early childhood programs. As I have mentioned in earlier blog posts, pre-k programs appear to have lasting effects on adult outcomes for former child participants even though there is considerable fading of effects on academic test scores in middle school and high school. These long-run effects on adult outcomes may be due to effects of pre-k programs on “soft skills” that tend to appreciate over time rather than depreciate, and which are not well measured by academic test scores in middle school and high school.
Yet even though this fading of test score effects of pre-k is well-known, every time it is rediscovered in some other state, it is used as a reason not to provide state funding of pre-k. However, this fading of effects of pre-k on “hard skills” could equally well be used to rationalize more reforms of K-12 schools, to make sure that the hard skill gains from pre-k are not lost. Or, these findings could be used as a reason to develop better measures of “soft skills” and how they change from pre-k through the K-12 system.
For evaluation to be constructively used, we need a political culture that has some consensus on appropriate government goals, and some willingness to use evaluation results to improve government programs. In a political culture without any consensus on government goals, it is very hard for evaluations to become anything more than talking points in bitter political fights.