AimThe aim of this study was to compare the clinical decision‐making for benzodiazepine deprescribing between a healthcare provider (HCP) and an artificial intelligence (AI) chatbot GPT4‐ (ChatGPT‐4).MethodsWe analysed real‐world data from a Croatian cohort of community‐dwelling benzodiazepine patients (n=154) within the EuroAgeism H2020 ESR 7 project. HCPs evaluated the data using pre‐established deprescribing criteria to assess benzodiazepine discontinuation potential. The research team devised and tested AI prompts to ensure consistency with HCP judgments. An independent researcher employed ChatGPT‐4 with predetermined prompts to simulate clinical decisions for each patient case. Data derived from human‐HCP and ChatGPT‐4 decisions were compared for agreement rates and Cohen’s kappa.ResultsBoth HPC and ChatGPT identified patients for benzodiazepine deprescribing (96.1% and 89.6%, respectively), showing an agreement rate of 95% (κ=0.200, p=0.012). Agreement on four deprescribing criteria ranged from 74.7% to 91.3% (lack of indication κ= 0.352, p<0.001; prolonged use κ=0.088, p=0.280; safety concerns κ=0.123, p=0.006; incorrect dosage κ=0.264, p=0.001). Important limitations of GPT‐4 responses were identified, including 22.1% ambiguous outputs, generic answers, and inaccuracies, posing inappropriate decision‐making risks.ConclusionWhile AI‐HCP agreement is substantial, sole AI reliance poses a risk for unsuitable clinical decision‐making. This study's findings reveal both strengths and areas for enhancement of ChatGPT‐4 in the deprescribing recommendations within a real‐world sample. Our study underscores the need for additional research on chatbot functionality in patient therapy decision‐making, further fostering the advancement of AI for optimal performance.