Accountability systems need to improve quality, not make things worse

The New York Times op-ed by Helen Ladd and Ed Fiske that I linked to the other day was based in part on a much longer recent paper by Helen Ladd. That paper is well worth reading. Professor Ladd reviews the research evidence on the link between poverty and educational achievement, discusses some of the many problems with the No Child Left Behind accountability system for K-12 education, and suggests alternative policies.

Professor Ladd’s bottom line includes the following: “The most productive step for the federal government in the short run would be to eliminate No Child Left Behind”.  Among the reasons for her position is that  the narrow test-based accountability systems of NCLB does not seem to be effective in improving education, and has some  “undesirable side effects”, as Ladd puts it.  Among these undesirable side effects is what Ladd describes as a “narrowing of the curriculum”.

I agree that narrow test-based accountability systems are a mistake. To give one example, I heard a speech earlier this week by the Indiana State Superintendent of Public Instruction, Dr. Tony Bennett. Dr. Bennett talked about Indiana’s main goals in improving K-12 education, which included ratcheting up the rigor and enforcement of the state’s accountability system which seeks to grade schools and identify failing schools. Dr. Bennett emphasized that this accountability system would focus on test score value-added of schools.

I asked Dr. Bennett whether the Indiana accountability system would adjust for differences in “summer learning loss” between schools located in high poverty communities versus schools in affluent communities. His frank answer was “No”. His argument was that making up for summer learning loss should be left up to the discretion and creativity of individual school districts. It was up to districts to figure out how to offer summer school, extended school years, or other programs to make up for summer learning loss.

One problem with such an accountability system is that it clearly does not assign grades to schools based on their true quality.  Two schools that are doing an equally good job in increasing learning during the school year may get significantly different grades based on how much their students gain or lose in learning during the summer.

A second problem is that our current levels of school funding are not intended to provide adequate funding for preventing summer learning loss. We have designed funding sufficient to provide school for around 180 days or so, not for year-round schools or a robust summer school program. Therefore, the freedom that schools have to address summer learning loss is not matched by resources to actually do much about summer learning loss.

One can imagine accountability systems that adjust for summer learning loss. But the reality is that almost any test-based accountability system that is likely in practice to be adopted will probably be quite imperfect.

For example, suppose we had a test-based accountability system that predicted year to year student learning gains by school as a function of a wide variety of school characteristics, including the percentage of students eligible for free and reduced price lunch. Even such a system would suffer from the problem that free and reduced price lunch status is only a very rough proxy for student socioeconomic status and other student characteristics that predict achievement gains. The ability of social scientists to find rough adjustments for school characteristics that work on average does not mean that the resulting measures are highly accurate for each and every school.

If the test-based accountability system provides imperfect measures of school quality, then one has to worry about unintended consequences and side-effects. For example, as Ladd and others have pointed out, an accountability system that is only based on a certain group of limited tests, and that has significant consequences for funding or regulatory sanctions, will inevitably lead to schools overly focusing on improving performance on the tests. If the tests are not a satisfactory overall measure of what we want schools to do, this focus is problematic.

Critics often point to the difficulty of measuring student performance in the arts, as well as the fact that testing programs tend to be focused on reading and math and not other subjects. But there’s also the issue of “soft skills”. If schools should in part be trying to teach students how to deal with peers and leadership figures, and how to work in teams, and if such “soft skills” are hard to assess through standardized tests, then accountability systems that are narrowly based on test scores gains will tend to discourage schools from adequately developing soft skills.

Is there an alternative? Yes. Ladd argues for a more holistic type of evaluation. This would be based on “the school inspectorates that are common in many countries around the world”. These systems involve regular outside evaluation and performance auditing of schools. They include tests, but also include human judgment, and observations of what is going on in the classroom. I would add that I would include information from surveys of parents, teachers, and in some cases students to see how they perceive the school’s performance.

This discussion is not only relevant to K-12 education. For early childhood education, there is even greater importance for a holistic approach to evaluation of preschool quality.

As I have outlined before, we should regularly collect data on student learning in pre-K. This includes collecting data from age-appropriate standardized tests, at pre-K entrance and kindergarten entrance. These tests can be used to get a rough idea of student progress that is due to the pre-K program versus progress just due to aging.

But we need to be careful about how we use such data. First, we need to make sure that we collect data not only on hard skills, but on soft skills – how much progress students are making in their behavior and other social skills.

Second, these test score measures should not be the sole or even the primary gauge of quality for a specific preschool. Rather, the evaluation process for individual preschools should be based on a regular audit that includes observations of classroom practices and policies, and interviews and surveys of a teachers, administrators, and parents.

Accountability in education is important. But the devil’s in the details. Let’s get the details right.

Tim Bartik is a senior economist at the Upjohn Institute for Employment Research, a non-profit and non-partisan research organization in Kalamazoo, Michigan. His research specializes in state and local economic development policies and local labor markets.
