PurposeClinic-based or community-based interventions can improve adherence to guideline-directed medication therapies (GDMTs) among patients with heart failure (HF). However, opportunities for such interventions are frequently missed, as providers may be unable to recognise risk patterns for medication non-adherence. Machine learning algorithms can help in identifying patients with high likelihood of non-adherence. While a number of multilevel factors influence adherence, prior models predicting non-adherence have been limited by data availability. We have established an electronic health record (EHR)-based cohort with comprehensive data elements from multiple sources to improve on existing models. We linked EHR data with pharmacy refill data for real-time incorporation of prescription fills and with social determinants data to incorporate neighbourhood factors.ParticipantsPatients seen at a large health system in New York City (NYC), who were >18 years old with diagnosis of HF or reduced ejection fraction (<40%) since 2017, had at least one clinical encounter between 1 April 2021 and 31 October 2022 and active prescriptions for any of the four GDMTs (beta-blocker, ACEi/angiotensin receptor blocker (ARB)/angiotensin receptor neprilysin inhibitor (ARNI), mineralocorticoid receptor antagonist (MRA) and sodium-glucose cotransporter 2 inhibitor (SGLT2i)) during the study period. Patients with non-geocodable address or outside the continental USA were excluded.Findings to dateAmong 39 963 patients in the cohort, the average age was 73±14 years old, 44% were female and 48% were current/former smokers. The common comorbid conditions were hypertension (77%), cardiac arrhythmias (56%), obesity (33%) and valvular disease (33%). During the study period, 33 606 (84%) patients had an active prescription of beta blocker, 32 626 (82%) had ACEi/ARB/ARNI, 11 611 (29%) MRA and 7472 (19%) SGLT2i. Ninety-nine per cent were from urban metropolitan areas.Future plansWe will use the established cohort to develop a machine learning model to predict medication adherence, and to support ancillary studies assessing associates of adherence. For external validation, we will include data from an additional hospital system in NYC.