Identifying Students At Risk Using Prior Performance Versus a Machine Learning Algorithm
This report provides information for administrators in local education agencies who are considering early warning systems to identify at-risk students. Districts use early warning systems to target resources to the most at-risk students and intervene before students drop out. Schools want to ensure the early warning system accurately identifies the students that need support to make the best use of available resources. The report compares the accuracy of using simple flags based on prior academic problems in school (prior performance early warning system) to an algorithm using a range of in- and out-of-school data to estimate the specific risk of each academic problem for each student in each quarter. Schools can use one or more risk-score cutoffs from the algorithm to create low- and high-risk groups. This study compares a prior performance early warning system to two risk-score cutoff options: a cutoff that identifies the same percentage of students as the prior performance early warning system, and a cutoff that identifies the 10 percent of students most at risk.
The study finds that the prior performance early warning system and the algorithm using the same-percentage risk score cutoffs are similarly accurate. Both approaches successfully identify most of the students who ultimately are chronically absent, have a low grade point average, or fail a course. In contrast, the algorithm with 10-percent cutoffs is good at targeting the students who are most likely to experience an academic problem; this approach has the advantage in predicting suspensions, which are rarer and harder to predict than the other outcomes. Both the prior performance flags and the algorithm are less accurate when predicting outcomes for students who are Black.
The findings suggest clear tradeoffs between the options. The prior performance early warning system is just as accurate as the algorithm for some purposes and is cheaper and easier to set up, but it does not provide fine-grained information that could be used to identify the students who are at greatest risk. The algorithm can distinguish degrees of risk among students, enabling a district to set cutoffs that vary depending on the prevalence of different outcomes, the harms of over-identifying versus under-identifying students at risk, and the resources available to support interventions.