BackgroundThe rapid development of large language model chatbots, such as ChatGPT, has created new possibilities for healthcare support. This study investigates the feasibility of integrating self-monitoring of hearing (via a mobile app) with ChatGPT’s decision-making capabilities to assess whether specialist consultation is required. In particular, the study evaluated how ChatGPT’s accuracy to make a recommendation changed over periods of up to 12 months.MethodsChatGPT-4.0 was tested on a dataset of 1,000 simulated cases, each containing monthly hearing threshold measurements over periods of up to 12 months. Its recommendations were compared to the opinions of 5 experts using percent agreement and Cohen’s Kappa. A multiple-response strategy, selecting the most frequent recommendation from 5 trials, was also analyzed.ResultsChatGPT aligned strongly with the experts’ judgments, with agreement scores ranging from 0.80 to 0.84. Accuracy scores improved to 0.87 when the multiple-query strategy was employed. In those cases where all 5 experts unanimously agreed, ChatGPT achieved a near-perfect agreement score of 0.99. It adapted its decision-making criteria with extended observation periods, seemingly accounting for potential random fluctuations in hearing thresholds.ConclusionsChatGPT has significant potential as a decision-support tool for monitoring hearing, able to match expert recommendations and adapting effectively to time-series data. Existing hearing self-testing apps lack capabilities for tracking and evaluating changes over time; integrating ChatGPT could fill this gap. While not without its limitations, ChatGPT offers a promising complement to self-monitoring. It can enhance decision-making processes and potentially encourage patients to seek clinical expertise when needed.Graphical abstract