Motivation: In this work, we aim to develop a computational approach for predicting DNA-binding sites in proteins from amino acid sequences. To avoid overfitting with this method, all available DNA-binding proteins from the Protein Data Bank (PDB) are used to construct the models. The random forest (RF) algorithm is used because it is fast and has robust performance for different parameter values. A novel hybrid feature is presented which incorporates evolutionary information of the amino acid sequence, secondary structure (SS) information and orthogonal binary vector (OBV) information which reflects the characteristics of 20 kinds of amino acids for two physical–chemical properties (dipoles and volumes of the side chains). The numbers of binding and non-binding residues in proteins are highly unbalanced, so a novel scheme is proposed to deal with the problem of imbalanced datasets by downsizing the majority class.Results: The results show that the RF model achieves 91.41% overall accuracy with Matthew's correlation coefficient of 0.70 and an area under the receiver operating characteristic curve (AUC) of 0.913. To our knowledge, the RF method using the hybrid feature is currently the computationally optimal approach for predicting DNA-binding sites in proteins from amino acid sequences without using three-dimensional (3D) structural information. We have demonstrated that the prediction results are useful for understanding protein–DNA interactions.Availability: DBindR web server implementation is freely available at http://www.cbi.seu.edu.cn/DBindR/DBindR.htm.Contact: xsun@seu.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.
Recently, large-scale datasets have vastly facilitated the development in nearly all domains of Natural Language Processing. However, there is currently no cross-task dataset in NLP, which hinders the development of multi-task learning. We propose MATINF, the first jointly labeled large-scale dataset for classification, question answering and summarization. MAT-INF contains 1.07 million question-answer pairs with human-labeled categories and usergenerated question descriptions. Based on such rich information, MATINF is applicable for three major NLP tasks, including classification, question answering, and summarization. We benchmark existing methods and a novel multi-task baseline over MATINF to inspire further research. Our comprehensive comparison and experiments over MATINF and other datasets demonstrate the merits held by
A priori attitude information can improve the success rate and reliability of Global Navigation Satellite System (GNSS) multi-antennae attitude determination. However, a priori attitude information is nonlinear, and integrating a priori information into the objective function rigorously will increase the complexity of an ambiguity domain search, such as the Multivariate Constrained-Least-squares Ambiguity Decorrelation Adjustment (MC-LAMBDA) method. In this paper, a new method based on attitude domain search is presented to make use of the a priori attitude angle information with high efficiency. First, the a priori information of pitch and roll is integrated into the search process to derive the analytic search step for attitude angle, and the integer candidates are determined by traversal search in the three-dimensional attitude domain. Then, the objective function is parameterised with Euler angles, and a non-iterative approximate method is utilised to simplify the iterative computation in calculating objective function values. Experimental results reveal that compared to the MC-LAMBDA method, our new method has the same success rate and reliability, but higher efficiency in making use of a priori attitude information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.