The YTH (YT521-B homology) domain was identified by sequence comparison and is found in 174 different proteins expressed in eukaryotes. It is characterized by 14 invariant residues within an ␣-helix/-sheet structure. Here we show that the YTH domain is a novel RNA binding domain that binds to a short, degenerated, single-stranded RNA sequence motif. The presence of the binding motif in alternative exons is necessary for YT521-B to directly influence splice site selection in vivo. Array analyses demonstrate that YT521-B predominantly regulates vertebrate-specific exons. An NMR titration experiment identified the binding surface for single-stranded RNA on the YTH domain. Structural analyses indicate that the YTH domain is related to the pseudouridine synthase and archaeosine transglycosylase (PUA) domain. Our data show that the YTH domain conveys RNA binding ability to a new class of proteins that are found in all eukaryotic organisms.The binding of proteins to RNA is a fundamental aspect of biology that interferes with most aspects of gene expression and cellular functions. The presence of various binding motifs defines the group of RNA binding proteins (1). Commonly found RNA binding domains include the RNA recognition motif (RRM), 3 the double-stranded RNA binding domain, the Piwi Argonaut and Zwille domain, and the heterogeneous nuclear ribonucleoprotein K homology domain. The most prominent RNA binding domain is the RRM that is found in ϳ2% of human proteins (2). The RRM is composed of two consensus sequences RNP2 and RNP1 that contain aromatic residues important for RNA binding. In other RNA binding motifs, such as the PUA (pseudouridine synthase and archaeosine transglycosylase) and OB-fold (oligonucleotide/oligosaccaride binding fold), the RNA interacts with the -sheets that form pseudobarrels (3). The general composition of the PUA domain is reminiscent of the OB-fold, a nucleic acid binding motif that displays only a low degree of sequence similarity between its members. The OB-fold consists of two three-stranded antiparallel -sheets, where strand 1 is shared by both sheets. The individual -sheets can be separated by protein parts of different length, which makes the identification based on primary structure difficult (4). The -sheets in the PUA and OB-folds form a ligand binding surface that can bind to nucleic acids through aromatic stacking, hydrogen bonding, as well as polar and hydrophobic interactions. The so-far unexplained RNA binding activities of proteins such as apontic (5) demonstrate that not all RNA binding domains have been described.One of the potentially new RNA binding domains is the YTH (YT521 homology) domain. The YTH domain is highly conserved during evolution and was identified by comparing all known protein sequences with the splicing factor YT521-B (6). The domain is found only in eukaryotes and is abundant in plants. The YTH domain can be between 100 and 150 amino acids in size and is characterized by 14 invariant and 19 highly conserved residues. It is predicted to contain ...
RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data.
We have developed a probabilistic modelling approach, which allows to consider diverse characteristic binding site properties to obtain more accurate representations of binding sites. These properties are modelled as random variables in Bayesian networks, which are capable of dealing with dependencies among binding site properties. Cross-validation on several datasets shows improvements in the false positive error rate and the significance (P-value) of true binding sites.
BioBayesNet is a new web application that allows the easy modeling and classification of biological data using Bayesian networks. To learn Bayesian networks the user can either upload a set of annotated FASTA sequences or a set of pre-computed feature vectors. In case of FASTA sequences, the server is able to generate a wide range of sequence and structural features from the sequences. These features are used to learn Bayesian networks. An automatic feature selection procedure assists in selecting discriminative features, providing an (locally) optimal set of features. The output includes several quality measures of the overall network and individual features as well as a graphical representation of the network structure, which allows to explore dependencies between features. Finally, the learned Bayesian network or another uploaded network can be used to classify new data. BioBayesNet facilitates the use of Bayesian networks in biological sequences analysis and is flexible to support modeling and classification applications in various scientific fields. The BioBayesNet server is available at http://biwww3.informatik.uni-freiburg.de:8080/BioBayesNet/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.