ProThermDB is an updated version of the thermodynamic database for proteins and mutants (ProTherm), which has ∼31 500 data on protein stability, an increase of 84% from the previous version. It contains several thermodynamic parameters such as melting temperature, free energy obtained with thermal and denaturant denaturation, enthalpy change and heat capacity change along with experimental methods and conditions, sequence, structure and literature information. Besides, the current version of the database includes about 120 000 thermodynamic data obtained for different organisms and cell lines, which are determined by recent high throughput proteomics techniques using whole-cell approaches. In addition, we provided a graphical interface for visualization of mutations at sequence and structure levels. ProThermDB is cross-linked with other relevant databases, PDB, UniProt, PubMed etc. It is freely available at https://web.iitm.ac.in/bioinfo2/prothermdb/index.html without any login requirements. It is implemented in Python, HTML and JavaScript, and supports the latest versions of major browsers, such as Firefox, Chrome and Safari.
Motivation
Machine learning techniques require various descriptors from protein and nucleic acid sequences to understand/predict their structure and function as well as distinguishing between disease and neutral mutations. Hence, availability of a feature extraction tool is necessary to bridge the gap.
Results
We developed a comprehensive web-based tool, Seq2Feature, which computes 252 protein and 41 DNA sequence-based descriptors. These features include physicochemical, energetic and conformational properties of proteins, mutation matrices and contact potentials as well as nucleotide composition, physicochemical and conformational properties of DNA. We propose that Seq2Feature could serve as an effective tool for extracting protein and DNA sequence-based features as applicable inputs to machine learning algorithms.
Availability and implementation
https://www.iitm.ac.in/bioinfo/SBFE/index.html.
Supplementary information
Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.