Clinical trial emulation, which is the process of mimicking targeted randomized controlled trials (RCT) with real world data (RWD), has attracted growing attentions and interests in recent years from pharmaceutical industry. Different from RCTs which have stringent eligibility criteria for recruiting participants, RWD are more representative of real world patients whom the drugs will be prescribed to. One technical challenge for trial emulation is how to conduct effective confounding control with complex RWD so that the treatment effects can be objectively derived. Recently many approaches, including deep learning algorithms, have been proposed for this goal, but there is still no systematic evaluation and practical guidance on them.
In this paper, we emulate $430,000$ trials from two large-scale RWD warehouses, covering both electronic health records (EHR) and general claims,
over 170 million patients spanning more than 10 years, aiming to identify new indications of approved drugs for Alzheimer's disease (AD). We have investigated the behaviors of multiple different approaches including logistic regression and deep learning models, and propose a new model selection strategy that can significantly improve the performance of confounding balance of the participants in different arms of emulated trials. We demonstrate that regularized logistic regression based propensity score (PS) model outperforms deep learning based PS model and others, which contradicts with our intuitions to certain extent. Finally,
we identified 8 drugs whose original indications are not AD (pantoprazole, gabapentin, acetaminophen, atorvastatin, albuterol, fluticasone, amoxicillin and omeprazole), hold great potential of being beneficial to AD patients.