Background: Selective non-reporting of studies and study results undermines trust in randomized controlled trials (RCTs). Changes to clinical trial outcomes are sometimes associated with bias. Manually comparing trial documents to identify changes in trial outcomes is time consuming.Objective: This study aims to assess the capacity of the Generative Pretrained Transformer 4 (GPT-4) large language model in detecting and describing changes in trial outcomes within ClinicalTrials.gov records.Methods: We will first prompt GPT-4 to define trial outcomes using five elements (i.e., domain, specific measurement, specific metric, method of aggregation, and time point). We will then prompt GPT-4 to identify outcome changes between the prospective versions of registrations and the most recent versions of registrations. We will use a random sample of 150 RCTs (~1,500 outcomes) registered on ClinicalTrials.gov. We will include “Completed” trials categorized as “Phase 3” or “Not Applicable” and with results posted on ClinicalTrials.gov. Two independent raters will rate GPT-4’s judgements, and we will assess GPT-4’s accuracy and reliability. We will also explore the heterogeneity in GPT-4’s performance by the year of trial registration and trial type (i.e., applicable clinical trials, NIH-funded trials, and other trials).Discussion: We aim to develop methods that could assist systematic reviewers, peer reviewers, journal editors, and readers in monitoring changes in clinical trial outcomes, streamlining the review process, and improving transparency and reliability of clinical trial reporting.