A Statistical Approach for Identifying Private Wells Susceptible to Perfluoroalkyl Substances (PFAS) Contamination
- Millions of Americans have drinking water contaminated by PFAS that exceeds provisional health guidelines.
- Classification random forest models performed the best among the various machine learning models tested at identifying private wells susceptible to PFAS contamination.
- Point sources such as the plastics/rubber and textime industries accounted for the highest contribution to model accuracy.
- This modeling approach utilizes nationally available predictors and can be extended to other regions.
Drinking water concentrations of per- and polyfluoroalkyl substances (PFAS) exceed provisional guidelines for millions of Americans. Data on private well PFAS concentrations are limited in many regions, and monitoring initiatives are costly and time-consuming. Here, we examine modeling approaches for predicting private wells likely to have detectable PFAS concentrations that could be used to prioritize monitoring initiatives. We used nationally available data on PFAS sources, and geologic, hydrologic and soil properties that affect PFAS transport as predictors, and trained and evaluated models using PFAS data (n∼ 2300 wells) collected by the state of New Hampshire between 2014 and 2017. Models were developed for the five most frequently detected PFAS: perfluoropentanoate, perfluorohexanoate, perfluoroheptanoate, perfluorooctanoate, and perfluorooctanesulfonate. Classification random forest models that allow nonlinearity in interactions among predictors performed the best (area under the receiver operating characteristics curve: 0.74–0.86). Point sources such as the plastics/rubber and textile industries accounted for the highest contribution to accuracy. Groundwater recharge, precipitation, soil sand content, and hydraulic conductivity were secondary predictors. Our study demonstrates the utility of machine learning models for predicting PFAS in private wells, and the classification random forest model based on nationally available predictors is readily extendable to other regions.
Follow the Evidence
Interested in the most current findings from Mathematica? Subscribe to our bi-weekly newsletter, Evidence & Insights, to stay up to date with the issues that matter to you.Sign Me Up