Multiple linear regression analysis is widely used to link an outcome with predictors for better understanding of the behaviour of the outcome of interest. Usually, under the assumption that the errors follow a normal distribution, the coefficients of the model are estimated by minimizing the sum of squared deviations. A new approach based on maximum likelihood estimation is proposed for finding the coefficients on linear models with two predictors without any constrictive assumptions on the distribution of the errors. The algorithm was developed, implemented, and tested as proof-of-concept using fourteen sets of compounds by investigating the link between activity/property (as outcome) and structural feature information incorporated by molecular descriptors (as predictors). The results on real data demonstrated that in all investigated cases the power of the error is significantly different by the convenient value of two when the Gauss-Laplace distribution was used to relax the constrictive assumption of the normal distribution of the error. Therefore, the Gauss-Laplace distribution of the error could not be rejected while the hypothesis that the power of the error from Gauss-Laplace distribution is normal distributed also failed to be rejected.
Various methods (Hartree–Fock methods, semi-empirical methods, Density Functional Theory, Molecular Mechanics) used to optimize a molecule structure feature the same basic approach but differ in the mathematical approximations used. The geometry optimization procedure calculates the energy at an initial geometry of a molecule and then proceeds to search a new geometry with a lower energy. Using the 3D structures collected from the PubChem database, 20 amino acid geometry optimization calculations were performed with several methods. The purpose of the study was to analyze these methods (39) to find the relationship between them and to determine which to use under different circumstances. Cluster analysis and principal component analysis were performed to evaluate the similarities between the different methods. The results after the analysis can classified into three main groups and can be selected accordingly to solve different types of problems.
Protein alignment finds its application in refining results of sequence alignment and understanding protein function. A previous study aligned single molecules, making use of the minimization of sums of the squares of eigenvalues, obtained for the antisymmetric Cartesian coordinate distance matrices Dx and Dy. This is used in our program to search for similarities between amino acids by comparing the sums of the squares of eigenvalues associated with the Dx, Dy, and Dz distance matrices. These matrices are obtained by removing atoms that could lead to low similarity. Candidates are aligned, and trilateration is used to attach all previously striped atoms. A TM-score is the scoring function that chooses the best alignment from supplied candidates. Twenty essential amino acids that take many forms in nature are selected for comparison. The correct alignment is taken into account most of the time by the alignment algorithm. It was numerically detected by the TM-score 70% of the time, on average, and 15% more cases with close scores can be easily distinguished by human observation.
In this paper, a model has been developed that can estimate the composition of the phenol compounds, based on censored data and the total equivalent antioxidant capacity (TEAC) measured by three different methods. A contingency of 32 plants was analyzed: total phenolic content (TPC), caffeic acid, p-coumaric acid, ferulic acid, neochlorogenic acid and TEAC. They were measured by three different methods: ABTS (2,20-azinobis-(3-ethylbenzthiazoline- 6-sulfonic acid)), DPPH (1,1-diphenyl-2-picrylhydrazyl radical) and FRAP (ferric reducing/antioxidant power). Five values of caffeic-, thirteen of p-coumaric-, seven of ferulic-, and nineteen neochlorogenic acids were missing. Due to the complexity of the compounds, data mining and computational methods are required to determine the missing data. The method developed for independent variables was used to estimate the missing data. The contingency was filled with the calculated values obtained with all alternatives. The performance of each approach is shown in the estimation and/or prediction of the phenolic composition compared to the approaches used. The results indicated a strong correlation and mutual influence between the data analyzed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.