We believed that the biological features obtained in this pioneering work would provide some useful insights into the formation and function of citrullination and the optimal classifier could be a useful tool to identify citrullination sites in protein sequences.
Background:
As a newly uncovered post-translational modification on the
ε-amino group of lysine residue, protein malonylation was found to be involved in
metabolic pathways and certain diseases. Apart from experimental approaches, several
computational methods based on machine learning algorithms were recently proposed to
predict malonylation sites. However, previous methods failed to address imbalanced data
sizes between positive and negative samples.
Objective:
In this study, we identified the significant features of malonylation sites in a
novel computational method which applied machine learning algorithms and balanced data
sizes by applying synthetic minority over-sampling technique.
Method:
Four types of features, namely, amino acid (AA) composition, position-specific
scoring matrix (PSSM), AA factor, and disorder were used to encode residues in protein
segments. Then, a two-step feature selection procedure including maximum relevance
minimum redundancy and incremental feature selection, together with random forest
algorithm, was performed on the constructed hybrid feature vector.
Results:
An optimal classifier was built from the optimal feature subset, which featured an
F1-measure of 0.356. Feature analysis was performed on several selected important features.
Conclusion:
Results showed that certain types of PSSM and disorder features may be
closely associated with malonylation of lysine residues. Our study contributes to the
development of computational approaches for predicting malonyllysine and provides
insights into molecular mechanism of malonylation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.