When policymakers and other on-the-ground decision makers—such as those in state agencies and schools—make choices under pressure and with limited time and resources, the stakes are high. Whether it’s improving student outcomes, advancing public health, or strengthening social programs, they need timely, trustworthy evidence to guide their decisions.
That’s why systematic evidence reviews are so important. These reviews synthesize evidence from high-quality research to help decision makers drive impact in their communities. Evidence reviews provide a trusted foundation for action, cutting through the noise to highlight the most reliable insights from research.
But these efforts have limitations. Traditional evidence reviews can’t always keep pace with the speed of decision making. Reviewers need to read and review every study against detailed criteria, carefully document their decisions, and complete extensive training. By the time an evidence review is published, the decisionmakers might have moved on. And because reviews require significant time and funding, they often focus on a narrow set of topics, sometimes leaving pressing questions unanswered.
Against this background, recent advances in artificial intelligence (AI) are creating new opportunities to improve evidence reviews. By automating time-consuming steps, AI can make reviews faster, more efficient, and more responsive to the needs of decision makers. Together with our federal partners that administer evidence clearinghouses, Mathematica is now applying these tools to strengthen the impact of evidence reviews and help ensure policymakers, practitioners, and the communities they serve benefit from reliable insights that arrive in time to matter.
In one of these advances, Mathematica built and tested an AI evidence of effectiveness reviewer tool, designed to help trained reviewers assess studies more efficiently.
Mathematica’s AI-assisted evidence-of-effectiveness reviewer
Mathematica’s new tool uses AI to generate a first-pass “seed review” of effectiveness studies. It assigns a preliminary evidence quality rating, explains the key factors behind that rating, and documents its reasoning. It also organizes outcome measures into domains, extracts findings, and flags missing information that might require follow-up with study authors. The tool can be tailored to any criteria for high-quality evidence and reporting templates.
Importantly, this tool helps reviewers conduct study reviews—it does not replace expert judgment. Human reviewers remain essential, and the tool’s benefits depend on thoughtful integration of its output into the human-led review process. Instead, our tool streamlines reviews, freeing reviewers to focus on understanding the study and evaluating evidence quality—not just documenting conclusions. It highlights key issues quickly and, when used early, helps keep reviews on track and prevents costly course corrections.
Testing shows that the tool provides accurate, logical explanations and applies evidence criteria consistently. Although it is better at explaining its reasoning than at assigning final ratings, it gets most ratings right, making just one potentially consequential error per review on average and correctly rating about two-thirds of studies. These results reinforce the need for human oversight—but also demonstrate how the tool can make evidence reviews faster and more efficient, helping reviewers tackle the most resource-intensive steps more quickly.
Transforming evidence reviews for tomorrow
The next step is clear: accelerate access to timely, reliable evidence for decision makers. AI innovations can make reviews faster and more efficient, but success depends on rigorous evaluation and thoughtful integration. Federal partners have a key role to play—by collaborating to test these tools, measure their impact on accuracy and efficiency, and scale approaches that strengthen evidence delivery and use. Together, we can transform evidence synthesis into a more powerful driver of impact.
Connect with us to learn more.
