Using Machine Learning to Explore Patient Safety Event Trends and Characteristics

2021 – 2026

Project Overview


To explore trends in patient safety incidents nationwide, Mathematica analyzes patient safety data using machine learning techniques, such as natural language processing. We also use advanced analytic methods to ensure the confidentiality of patient safety data.

Project Motivation

AHRQ established the Patient Safety Organization Privacy Protection Center and the Network of Patient Safety Databases to collect patient safety data and explore the underlying causes of health care errors and risks. AHRQ seeks to understand patient safety trends to inform ongoing health care improvement. 

Partners in Progress

Cormac Corporation

Prepared For

U.S. Department of Health and Human Services, Agency for Healthcare Research and Quality

The Agency for Healthcare Research and Quality (AHRQ), the primary federal agency responsible for patient safety, maintains the Network of Patient Safety Databases to support exploration and monitoring of patient safety risks and concerns.

Patient safety organizations collect data from patient safety event reports completed by health care providers and submit them to the Patient Safety Organization Privacy Protection Center for de-identification and aggregation at the national level. This data details events such as falls or medication-related incidents that occur while a patient is in a specific health care setting, such as a hospital. The Network of Patient Safety Databases (NPSD) then analyzes the data and shares it across public-facing dashboards, chartbooks, and reports to build awareness of patient safety trends and incident characteristics among researchers, health systems, and the public. Mathematica and our partner, Cormac, support AHRQ’s efforts by implementing and refining de-identification methods to ensure the privacy of this data. In addition, we analyze the data to uncover patterns in incident characteristics and trends across time periods and patient demographics.

Report data from patient safety organizations follow the Common Formats for Event Reporting. The data contain a mix of both free text fields, in which respondents can add their own text if their answer does not easily map to an existing value, and structured fields, in which a respondent could select an existing answer value. Although free text provides valuable event context, manual review of all the text is time-consuming and impractical. To make text review more efficient and to obtain an overview of topics and themes discussed in these texts, Mathematica applies various natural language processing methods, which we highlight in the following examples: 

  • To explore characteristics of medication-related events and other events (i.e., events that do not easily map to a specified category on the reporting form), we use natural language processing. Natural language processing includes text clustering and topic modeling, which can find frequently used short phrases and common themes discussed by respondents. This type of analysis enables us to summarize topics discussed in these texts and recognize topics that are not captured in the current structured data fields.   
  • To understand if these free text fields contain themes that align with the current Common Formats, we use large language models to classify the text using labels derived from the Common Format data element and answer values. This analysis enables us to explore alternative methods for reporting that can vastly reduce reporting burdens.

Some structured data fields present different analytical challenges and learning opportunities. For example, when reporters record multiple interventions for a fall, traditional statistical analysis does not readily provide information about these concurrent interventions. Mathematica applies machine learning approach to better understand patterns of fall interventions. We use frequent pattern mining to find common combinations of prevention strategies. While rates of harm varied across patients with different risk factors, our analysis showed commonly used interventions in place across these groups, which may reveal opportunities for more tailored care. 

In addition to providing data analysis that informs publicly available NPSD dashboards and chartbooks, Mathematica and Cormac have co-authored publications that explore patient safety outcomes, including an AHRQ Data Spotlight article on a subgroup analysis that used NPSD data to explore factors and clinical outcomes of falls. We also co-authored an AHRQ Data Spotlight article on frequent pattern mining analysis of fall interventions.

Related Staff

Sharon Zhao

Sharon Zhao

Lead Data Scientist

View Bio Page
Arnold Chen

Arnold Chen

Senior Researcher

View Bio Page

See Clearly. Act Quickly.

From local to global challenges in health, human services, and international development, we’re here to improve public well-being and make progress together. Learn more about becoming a Mathematica client or partner.

Work With Us