Articles

Predicting Post-ERCP Pancreatitis Using Machine Learning: Risk Stratification and Feature Importance Analysis

11/19/2025 10:43:23 AM
Introduction

Endoscopic retrograde cholangiopancreatography (ERCP) is a key diagnostic and therapeutic procedure for managing pancreatobiliary disorders. Despite its utility, the procedure carries potential complications, even in experienced hands. The most common adverse event associated with ERCP is post-ERCP pancreatitis (PEP), which occurs in approximately 10.2% of patients, with an increased incidence of 14.1% among high-risk individuals. Multiple patient- and procedure-related risk factors have been identified for PEP. Nevertheless, despite in understanding these risk factors, PEP remains frequently unpredictable. Consequently, novel strategies for predicting this adverse event are urgently needed. Developing machine learning (ML) models to predict PEP risk, with the aim of improving peri-procedural risk assessments and ultimately allowing targeted prevention strategies.

Method

This study was performed using data from an existing prospective ERCP registry. Baseline and follow-up data were collected for patients with native papilla who underwent ERCP between 2022 and 2024 at a tertiary referral center in Iran. Patients < 18 years old, those who were pregnant, or those with altered luminal gastrointestinal anatomy, unsuccessful biliary cannulation, chronic pancreatitis, an ampullary mass, or pancreatic duct (PD) stent placement were excluded. PEP diagnosis was confirmed through a prospective case review using revised Atlanta classification criteria. The CatBoost model was used to analyze the importance of all features. From this analysis, 19 features were identified as contributing to PEP risk. Twenty-five feature sets were then created, each containing between 5 and 19 features. A CatBoost model was trained on each feature set, and the model with the highest area under the receiver operating characteristic curve (AUC-ROC) was selected for further evaluation on an independent validation set. Two risk cut-offs were defined based on risk probability distributions and expert knowledge, dividing patients into low-, borderline, and high-risk categories. SHapley Additive exPlanations (SHAP) analysis determined the importance of the feature.

Results

Of 1,330 screened patients, 1,190 met the inclusion criteria, and 170 (14.3%) developed PEP. The best performing CatBoost model (Figure 1a-b) included eight features: age, sex, abnormal papilla morphology, PD cannulation, difficult cannulation, abnormal bilirubin levels, common bile duct diameter, and successful stone extraction. This model achieved an AUC-ROC of 0.688 (95% confidence interval (CI): 0.647–0.718), sensitivity of 0.704 (95% CI: 0.643–0.738), specificity of 0.672 (95% CI: 0.614–0.708), a positive predictive value (PPV) of 0.265 (95% CI: 0.236–0.292), and a negative predictive value (NPV) of 0.920 (95% CI: 0.889–0.948). The calibration curve (Figure 1c) demonstrated good alignment between predicted probabilities and observed outcomes. Stratification into low-, borderline-, and high-risk groups (Figure 1d) showed a clear gradient in PEP incidence (high-risk: 40%, low-risk: 5%). SHAP analysis revealed complex, asymmetrical, nonlinear interactions among the features (Figure 2).

Conclusion

We have developed an ML model that creates meaningful PEP risk assessments using readily available clinical data. ML algorithms were used to elucidate the complex relationships among clinical predictors of PEP. Future work is needed to validate the model performance on external and prospective data.