Large language models (LLMs) built on artificial intelligence (AI) – such as ChatGPT and GPT-4 – hold immense potential to support, augment, or even replace psychotherapy. Enthusiasm about such applications is mounting in the field as well as industry. These developments promise to address insufficient mental healthcare system capacity and scale individual access to personalized treatments. However, clinical psychology is an uncommonly high stakes application domain for AI systems, as responsible and evidence-based therapy requires nuanced expertise. Here, we provide a roadmap for ambitious yet responsible applications of clinical LLMs. First, we discuss potential applications of clinical LLMs to clinical care, training, and research, emphasizing imminent applications while highlighting areas that present risk given the high-stakes, complex nature of psychotherapy. Second, we outline a continuum of assistive to fully autonomous clinical LLM applications that could be integrated into digital treatment modalities, analogous to the development of autonomous vehicle technology. Third, we outline recommendations for the responsible development of clinical LLMs, which should center clinical science and improvement, involve robust interdisciplinary collaboration, and attend to issues like assessment, risk detection, transparency, and bias. Fourth, we offer recommendations for the critical evaluation of clinical LLMs, arguing that psychologists are uniquely positioned to scope and guide the development and evaluation of clinical LLMs. Last, we outline a vision for how LLMs might allow for a new generation of studies of evidence-based interventions at scale, and how these studies may challenge assumptions about psychotherapy.