In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before.
A description is provided of the software algorithms developed for the CMS tracker both for reconstructing charged-particle trajectories in proton-proton interactions and for using the resulting tracks to estimate the positions of the LHC luminous region and individual primary-interaction vertices. Despite the very hostile environment at the LHC, the performance obtained with these algorithms is found to be excellent. For tt events under typical 2011 pileup conditions, the average trackreconstruction efficiency for promptly-produced charged particles with transverse momenta of p T > 0.9 GeV is 94% for pseudorapidities of |η| < 0.9 and 85% for 0.9 < |η| < 2.5. The inefficiency is caused mainly by hadrons that undergo nuclear interactions in the tracker material. For isolated muons, the corresponding efficiencies are essentially 100%. For isolated muons of p T = 100 GeV emitted at |η| < 1.4, the resolutions are approximately 2.8% in p T , and respectively, 10 µm and 30 µm in the transverse and longitudinal impact parameters. The position resolution achieved for reconstructed primary vertices that correspond to interesting pp collisions is 10-12 µm in each of the three spatial dimensions. The tracking and vertexing software is fast and flexible, and easily adaptable to other functions, such as fast tracking for the trigger, or dedicated tracking for electrons that takes into account bremsstrahlung.
The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the ‘dark’ proteome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.