Background Sepsis-induced acute lung injury (ALI) is a heterogenous syndrome with high incidence and mortality. The diagnosis is often delayed which requires a chest imaging. Identifying diagnostic biomarkers may improve screening to identify septic patients at high risk of ALI earlier and provide the potential effective therapeutic drugs. Gene signatures obtained from peripheral blood have been shown to be dysregulated in sepsis and sepsis-induced ALI, which could provide additional noninvasive means for diagnosis. Machine learning algorithms are strong methods which can improve our ability to find relevant features in large and high-dimension data from gene expression profiles. The study aimed to develop a robust diagnostic model for the prediction of sepsis-induced ALI by using multiple machine learning algorithms, and validate the model for its predictive capability in external datasets.Methods The datasets were obtained from GEO and ArrayExpress databases. Following quality control and normalization, the datasets (GSE66890, GSE10474 and GSE32707) were merged as the training set, and four machine learning feature selection methods (Elastic net, svm, random forest and XGBoost) were applied to construct the diagnostic model. The other datasets were considered as the validation sets. Then, we explore the function of selected features and assess the correlation between selected features and immune cells. To further evaluated the performance and predictive value of diagnostic model, nomogram, Decision Curve Analysis (DCA) and Clinical Impact Curve (CIC) were constructed. Finally, the potential small molecular compounds interacting with selected features were explored from CTD database.Results The results of GSEA showed that immune response and metabolism might play an important role in the pathogenesis of sepsis-induced ALI. Then, 52 genes were identified as putative biomarkers by consensus feature selection from all four methods. Among them, 5 genes (ARHGDIB, ALDH1A1, TACR3, TREM1 and PI3) were selected by all methods and used to predict ALI diagnosis with high accuracy. The external datasets (E-MTAB-5273 and E-MTAB-5274) demonstrated that the diagnostic model have great accuracy with AUC value was 0.725 and 0.833, respectively. In addition, the nomogram, DCA and CIC showed that the diagnostic model had great performance and predictive value. Finally, the small molecular compounds (Curcumin, Tretinoin, Estradiol and Dexamethasone) were screened as the potential therapeutic agents for sepsis-induced ALI.Conclusion This consensus of multiple machine learning algorithms identified 5 genes (ARHGDIB, ALDH1A1, TACR3, TREM1 and PI3) that were able to distinguish ALI from septic patients. The diagnostic model could identify septic patients at high risk of ALI, and provide a promising therapeutic target for sepsis-induced ALI.