Balancing innovation and evidence in the use of AI chatbots for behavioral health

Balancing innovation and evidence in the use of AI chatbots for behavioral health

Nov 12, 2025

Across the country, the demand for behavioral health services continues to overwhelm available care. Over one-third of the U.S. population lives in an area federally designated as having a mental health workforce shortage, and one in four adults with mental illness reports unmet treatment needs. To try to fill these gaps and promote accessible care, people are increasingly turning to artificial intelligence (AI) chatbots for short-term behavioral health support, with some promising results. Chatbots can reduce feelings of isolation, offer low-cost psychoeducation, and serve as a stopgap while waiting for formal clinical care.

But the rapid adoption of AI-powered tools is outpacing the research on their safe use. In fact, the limited amount of emerging research suggests that heavier reliance on chatbots could worsen outcomes over time—deepening loneliness, fostering emotional dependence, and reducing real-world social engagement. How can state leaders and policymakers use data and evidence to drive safe and responsible use of AI chatbots in behavioral health settings without curbing innovation?

The Promise of Access, the Need for Evidence

Generative AI chatbots use a large language model to simulate a human-like conversation with users. As people face waitlists for treatment, high costs for care, long commutes to a provider from a rural area, or inconvenient clinic hours, a chatbot might feel like a more accessible option for care that is better than no care at all.

But this perceived convenience is not the same as evidence-based clinical value. The chatbots that people use are often not designed specifically for behavioral health purposes. Plus, chatbots are not backed by clinical expertise, cannot ensure client safety, and fall short of the human connection that trained clinicians provide. Most existing studies on chatbot use in behavioral health are small and short term, and they focus more on scripted chatbots—which offer a finite number of pre-determined response options—than more sophisticated and less predictable generative AI chatbots. And with models often being updated, it’s difficult for research evaluating effectiveness to keep pace. For example, OpenAI recently announced changes to ChatGPT’s behavior in sensitive conversation, underscoring how quickly these models can shift in ways that affect their utility.

Rigorous and rapid evaluation is needed so policymakers and providers can take a timely, evidence-informed approach to assessing the role of AI chatbots in behavioral healthcare.

State Actions to Lead with Evidence

As state leaders explore legislative and policy options to improve access to behavioral healthcare for residents, they must confront a pivotal challenge: how to strike the right balance between addressing the risks and optimizing the potential benefits of using AI chatbots in behavioral health settings. Early state-level actions reflect an open but cautious approach to considering AI chatbots, highlighting the growing need for evidence to guide, strengthen, and refine future policymaking.

  • Utah enacted regulations that require mental health chatbots to provide clear disclosure, uphold privacy protections, and operate under oversight. The regulations emphasize transparency and include safeguards intended to minimize harm while supporting continued innovation.
  • Nevada enacted a law in July 2025 prohibiting AI systems from delivering therapy or counseling, while permitting limited administrative uses—such as scheduling or documentation—under the supervision of licensed professionals.
  • Illinois recently passed legislation that blocks AI from making mental health and therapeutic decisions but allows its use for administrative support services.
  • California enacted two significant measures (Senate Bill 243 and Assembly Bill 489) in October 2025, which established legal safeguards for AI-driven companions and behavioral health chatbots. Chatbots must clearly disclose that they’re not human, implement age verification and suicide prevention protocols, and prohibit misleading claims about clinical expertise.

In addition, a bipartisan bill introduced in Congress in late October would align with California’s requirements by (1) mandating age verification and clear disclosures that users are interacting with AI and (2) prohibiting minors from accessing AI companion chatbots.

Ensuring Safe and Effective Adoption

As AI chatbots become more embedded in behavioral healthcare, the path forward demands clarity, caution, and collaboration. We cannot risk deploying digital tools that could harm the very people who are seeking support. Nor should we mistake technological novelty and accessibility for clinical effectiveness and therapeutic integrity. When it comes to the safety of children, youth, and adults who need behavioral health services, enthusiasm for innovation must be matched and guided by rigorous evidence. 

To translate the perceived benefits of chatbots into real-world, evidence-based value, state leaders could prioritize the following actions:

  1. Evaluate effectiveness and safety. To make informed decisions, leaders need to ask targeted questions about how different AI tools are trained, which safeguards they include (such as crisis detection and escalation), how they protect user privacy, and what outcomes they have demonstrated. Resources such as Mathematica’s Readiness Guide for Behavioral Healthcare Organizations Considering AI can help decision makers structure these assessments and tailor them to AI chatbot–specific use cases.
  2. Establish guardrails. AI chatbot adoption in behavioral health should be cautious and evidence driven. States could require independent ethics reviews, pilot testing, or randomized trials to distinguish between rigorously evaluated digital therapeutics and general-purpose chatbots. Clear labeling and disclosures—whether the chatbot is for psychoeducation, administrative support, or therapeutic reinforcement—can also help manage risk.
  3. Balance innovation with oversight. To build trust and support public safety, policymakers should include input from clinicians, researchers, ethicists, advocates, behavioral healthcare consumers, and community representatives when making decisions. Human oversight is key—not just for accountability but to weigh benefits against risks and keep people at the center of care.

With deep, integrated expertise in behavioral health, evaluation, policy guidance, and technical assistance, Mathematica can help policymakers move from uncertainty to confident, evidence-based action surrounding the safe and effective use of AI chatbots for behavioral health.

About the Author

Chris Bory

Chris Bory

Principal Researcher
View More by this Author