Background
The number of Individual Case Safety Reports (ICSRs) in pharmacovigilance databases are rapidly increasing world‐wide. The majority of ICSRs at the Netherlands Pharmacovigilance Centre Lareb is reviewed manually to identify potential signal triggering reports (PSTR) or ICSRs which need further clinical assessment for other reasons.
Objectives
To develop a prediction model to identify ICSRs that require clinical review, including PSTRs. Secondly, to identify the most important features of these reports.
Methods
All ICSRs (n = 30 424) received by Lareb between October 1, 2017 and February 26, 2021 were included. ICSRs originating from marketing authorisation holders and ICSRs reported on vaccines were excluded. The outcome was defined as PSTR (yes/no), where PSTR ‘yes’ was defined as an ICSR discussed at a signal detection meeting. Nineteen features were included, concerning structured information on: patients, adverse drug reactions (ADR) or drugs. Data were divided into a training (70%) and test set (30%) using a stratified split to maintain the PSTR/no PSTR ratio. Logistic regression, elastic net logistic regression and eXtreme Gradient Boosting models were trained and tuned on a training set. Random down‐sampling of negative controls was applied on the training set to adjust for the imbalanced dataset. Final models were evaluated on the test set. Model performances were assessed using the area under the curve (AUC) with 95% confidence interval of a receiver operating characteristic (ROC), and specificity and precision were assessed at a threshold for perfect sensitivity (100%, to not miss any PSTRs). Feature importance plots were inspected and a selection of features was used to re‐train and test model performances with fewer features.
Results
1439 (4.7%) of reports were PSTR. All three models performed equally with a highest AUC of 0.75 (0.73–0.77). Despite moderate model performances, specificity (5%) and precision (5%) were low. Most important features were: ‘absence of ADR in the Summary of product characteristics’, ‘ADR reported as serious’, ‘ADR labelled as an important medical event’, ‘ADR reported by physician’ and ‘positive rechallenge’. Model performances were similar when using only nine of the most important features.
Conclusions
We developed a prediction model with moderate performances to identify PSTRs with nine commonly available features. Optimisation of the model using more ICSR information (e.g., free text fields) to increase model precision is required before implementation.