Beyond Test Scores: Improving Research Evidence on Education

Beyond Test Scores: Improving Research Evidence on Education

May 11, 2016
Teacher Helping Student

The central question in education policy—How well are schools preparing students for their futures?—cannot be answered by looking at test scores alone.

Last month, the Journal of Policy Analysis and Management released our paper (with Tim Sass of Georgia State and Ron Zimmer of Vanderbilt) that provided the first large-scale evidence of the effects of charter schools on the earnings of their graduates, years after graduation. Using statewide data from Florida, we found that students who attended charter high schools not only experienced higher graduation rates and higher rates of college entry (relative to a comparison group), but that they also had higher rates of college persistence and higher earnings in their mid-20s.

These findings were interesting, not only because they provided new evidence on charter schools’ long-term effects, but also because the positive effects on educational attainment and earnings would not have been predicted from the students’ test scores. The same charter schools that seem to be producing positive long-term effects did not have positive effects on short-term test scores.

Findings of positive long-term effects—even, in many cases, without positive test score effects—have been observed in other studies of small, mission-driven high schools of choice, such as New York City’s small high schools; Washington, DC’s, voucher schools; and Catholic high schools. All of these schools appear to be helping their students build skills that have substantial long-term benefits that are not captured in test scores. Interestingly, the pattern of long-term benefits without persisting test score effects has also been observed in research on the effects of high quality early childhood programs.

All of these findings suggest that, in matters of education policy, improving research evidence will require attention to a wider range of outcomes than test scores alone—at least until we have tests that can effectively measure the “noncognitive” skills and behaviors that are necessary for success in school, work, citizenship, and life. (I confess that I sometimes envy my colleagues who work in health research, where what constitutes a favorable outcome is more often well defined and uncontroversial than in education.)

Specifically, researchers (and policymakers and educators) need measures that predict long-term outcomes but that don’t require waiting many years to collect. Even though I’m proud of our new study of the long-term effects of charter schools, it is unfortunate that the field had to wait over 20 years since the initiation of charter schools to produce such evidence. The students included in our study enrolled in high school between 1998 and 2001—quite a long time ago. It would be useful if future studies could examine predictors of long-term outcomes for current students, potentially making findings much more timely.

Fortunately, measures of noncognitive skills and behaviors for current students are receiving considerable attention from researchers. Policymakers are paying notice as well. In the new Every Student Succeeds Act, Congress modified the outcome-based accountability requirements of the old No Child Left Behind Act, giving states new authority to couple test-based measures with another, unspecified measure of student success or school performance. A group of school districts in California has already chosen to add student “grit” to the test-based outcomes it measures in its schools. 

Unfortunately, even the scholar who has brought grit to the forefront of public discussion (the University of Pennsylvania’s Angela Duckworth) acknowledges that current measures of grit are flawed. More generally, the field does not (yet) have many good measures of noncognitive skills that can be implemented at modest cost, that produce results consistent across schools and student populations, and that reliably predict students’ long-term outcomes. For example, as others have pointed out, existing measures of grit are plagued by “reference bias”: students tend to rate themselves relative to others around them, which undermines comparisons across schools.

In sum, the new attention to noncognitive measures of students’ skills and behavior is well supported by a growing body of research that demonstrates substantial long-term effects that are not strongly related to test scores. We researchers have a lot more work to do to improve research evidence on educational interventions and outcomes.

Read more about our education work.

About the Author