2014
DOI: 10.1016/j.chemolab.2014.05.010
|View full text |Cite
|
Sign up to set email alerts
|

A novel variable reduction method adapted from space-filling designs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
53
0
2

Year Published

2015
2015
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 77 publications
(55 citation statements)
references
References 29 publications
0
53
0
2
Order By: Relevance
“…Descriptor matrix was refined by eliminating descriptors having correlation lower than 0.3 leaving 4248 descriptors. V-WSP variable reduction MATLAB routine which is an unsupervised variable reduction based on V-WSP algorithm (Ballabio et al, 2014) was subsequently applied on 4248 descriptors with 0.85 absolute correlations which gave 357 most suitable descriptors. The dataset was split into training set of 44 compounds and test set of 13 compounds.…”
Section: Variable Selection and Model Developmentmentioning
confidence: 99%
“…Descriptor matrix was refined by eliminating descriptors having correlation lower than 0.3 leaving 4248 descriptors. V-WSP variable reduction MATLAB routine which is an unsupervised variable reduction based on V-WSP algorithm (Ballabio et al, 2014) was subsequently applied on 4248 descriptors with 0.85 absolute correlations which gave 357 most suitable descriptors. The dataset was split into training set of 44 compounds and test set of 13 compounds.…”
Section: Variable Selection and Model Developmentmentioning
confidence: 99%
“…Following descriptor generation, the datasets were curated by removing variables with null values as well as zero variance variables, i.e., descriptors with a standard deviation <0.0001. This is referred to as the “Original dataset.” The original datasets were then subjected to variable—Wootton, Sergent, Phan‐Tan‐Luu algorithm (V‐WSP) reduction, an unsupervised variable reduction method that allows for the elimination of variables based on multicollinearity . This was performed via a grid search to find the correlation coefficient threshold for the descriptors for which the Procrustes index was lower than 0.2.…”
Section: Methodsmentioning
confidence: 99%
“…The Procrustes index is a statistical measure that allows for the assessment of the degree of comparability between the original and reduced datasets based on informational content. A Procrustes value of 1 indicates complete dissimilarity and 0 indicates that both datasets are identical . The grid search was carried out separately for the descriptors of different domains to avoid removing chance correlations due to the presence of similar amino acids in different domains of the variable regions, i.e., VH and VL domains.…”
Section: Methodsmentioning
confidence: 99%
“…al. [29]. After running the V-WSPtool successfully, a log file is generated that consists of names of all the descriptorswhich are removed based on variance cut-off (i.e.…”
Section: 4data Pre-treatment -V-wsp Toolmentioning
confidence: 99%
“…In this step, variance is calculated for each descriptor j(where, …,m)and the descriptors with variance less than user specified variance cut-off value are removed.Further, the V-WSPalgorithmis applied on the modified data matrix with n rows and p columns (p= number of descriptors remaining after removing constant descriptors), to remove intercorrelated descriptors. The V-WSP algorithm mentioned below is according to the literature [29]. 1.…”
mentioning
confidence: 99%