Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting student retention at various levels, using higher education data and specifying the relevant variables involved in the modeling. Then, we utilize this information to help the process of knowledge discovery. We predict student retention at each of three levels during their first, second, and third years of study, obtaining models with an accuracy that exceeds 80% in all scenarios. These models allow us to adequately predict the level when dropout occurs. Among the machine learning algorithms used in this work are: decision trees, k-nearest neighbors, logistic regression, naive Bayes, random forest, and support vector machines, of which the random forest technique performs the best. We detect that secondary educational score and the community poverty index are important predictive variables, which have not been previously reported in educational studies of this type. The dropout assessment at various levels reported here is valid for higher education institutions around the world with similar conditions to the Chilean case, where dropout rates affect the efficiency of such institutions. Having the ability to predict dropout based on student’s data enables these institutions to take preventative measures, avoiding the dropouts. In the case study, balancing the majority and minority classes improves the performance of the algorithms.
Background: In the field of protein engineering and biotechnology, the discovery and characterization of structural patterns is highly relevant as these patterns can give fundamental insights into protein-ligand interaction and protein function. This paper presents GSP4PDB, a bioinformatics web tool that enables the user to visualize, search and explore protein-ligand structural patterns within the entire Protein Data Bank. Results: We introduce the notion of graph-based structural pattern (GSP) as an abstract model for representing protein-ligand interactions. A GSP is a graph where the nodes represent entities of the protein-ligand complex (amino acids and ligands) and the edges represent structural relationships (e.g. distances ligand -amino acid). The novel feature of GSP4PDB is a simple and intuitive graphical interface where the user can "draw" a GSP and execute its search in a relational database containing the structural data of each PDB entry. The results of the search are displayed using the same graph-based representation of the pattern. The user can further explore and analyse the results using a wide range of filters, or download their related information for external post-processing and analysis. Conclusions: GSP4PDB is a user-friendly and efficient application to search and discover new patterns of protein-ligand interaction.
(2017) 'Mutantelec : AnIn Silicomutation simulation platform for comparative electrostatic potential proling of proteins.', Journal of computational chemistry., 38 (7). pp. 467-474. Further information on publisher's website:https://doi.org/10.1002/jcc.24712 Publisher's copyright statement: This is the accepted version of the following article: Valdebenito-Maturana, Braulio, Reyes-Suarez, Jose Antonio, Henriquez, Jaime, Holmes, David S., Quatrini, Raquel, Pohl, Ehmke Arenas-Salinas, Mauricio (2017). Mutantelec: AnIn Silicomutation simulation platform for comparative electrostatic potential proling of proteins. Journal of Computational Chemistry 38(7): 467-474, which has been published in nal form at https://doi.org/10.1002/jcc.24712. This article may be used for non-commercial purposes in accordance With Wiley Terms and Conditions for self-archiving. Use policyThe full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-prot purposes provided that:• a full bibliographic reference is made to the original source • a link is made to the metadata record in DRO • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders.Please consult the full DRO policy for further details. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 Journal of Computational Chemistry 30The electrostatic potential plays a key role in many biological processes like determining the affinity of a 31 ligand to a given protein target, and they are responsible for the catalytic activity of many enzymes. 32Understanding the effect that amino acid mutations will have on the electrostatic potential of a protein, will
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.