Cell-penetrating peptides (CPPs) are short chains of amino acids that have shown remarkable potential to cross the cell membrane and deliver coupled therapeutic cargoes into cells. Designing and testing different CPPs to target specific cells or tissues is crucial to ensure high delivery efficiency and reduced toxicity. However, in vivo/in vitro testing of various CPPs can be both time-consuming and costly, which has led to interest in computational methodologies, such as Machine Learning (ML) approaches, as faster and cheaper methods for CPP design and uptake prediction. However, most ML models developed to date focus on classification rather than regression techniques, because of the lack of informative quantitative uptake values. To address these challenges, we developed POSEIDON, an open-access and up-to-date curated database that provides experimental quantitative uptake values for over 2,300 entries and physicochemical properties of 1,315 peptides. POSEIDON also offers physicochemical properties, such as cell line, cargo, and sequence, among others. By leveraging this database along with cell line genomic features, we processed a dataset of over 1,200 entries to develop an ML regression CPP uptake predictor. Our results demonstrated that POSEIDON accurately predicted peptide cell line uptake, achieving a Pearson correlation of 0.87, Spearman correlation of 0.88, and r2 score of 0.76, on an independent test set. With its comprehensive and novel dataset, along with its potent predictive capabilities, the POSEIDON database and its associated ML predictor signify a significant leap forward in CPP research and development. The POSEIDON database and ML Predictor are available for free and with a user-friendly interface at https://moreiralab.com/resources/poseidon/, making them valuable resources for advancing research on CPP-related topics. Scientific Contribution Statement: Our research addresses the critical need for more efficient and cost-effective methodologies in Cell-Penetrating Peptide (CPP) research. We introduced POSEIDON, a comprehensive and freely accessible database that delivers quantitative uptake values for over 2,300 entries, along with detailed physicochemical profiles for 1,315 peptides. Recognizing the limitations of current Machine Learning (ML) models for CPP design, our work leveraged the rich dataset provided by POSEIDON to develop a highly accurate ML regression model for predicting CPP uptake.
Graphical Abstract