2022
DOI: 10.3390/math10152671
|View full text |Cite
|
Sign up to set email alerts
|

PCDM and PCDM4MP: New Pairwise Correlation-Based Data Mining Tools for Parallel Processing of Large Tabular Datasets

Abstract: The paper describes PCDM and PCDM4MP as new tools and commands capable of exploring large datasets. They select variables based on identifying the absolute values of Pearson’s pairwise correlation coefficients between a chosen response variable and any other existing in the dataset. In addition, for each pair, they also report the corresponding significance and the number of non-null intersecting observations, and all this reporting is performed in a record-oriented manner (both source and output). Optionally,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 62 publications
0
3
0
Order By: Relevance
“…After performing the second phase meant for filters starting from pairwise correlation coefficients as absolute values (≥0.1), together with their significance (p < 0.001) and support (at least a third of the data or N > 142,150), 19 variables resulted as indicated in Table 1. The same results were more easily achieved using the PCDM command (Stata script at https://tinyurl.com/25pd6mx6, accessed on 30 January 2023) in Stata [73] and three parameters (minacc (0.1) minn (142,150) maxp (0.001)) corresponding to those three filters above. The next concern before going to the third selection step (dedicated to cross-validations on specified criteria) was to recode ("nt" call sign meaning null treatment) the remaining variables (all 19 in Table 1).…”
Section: Resultsmentioning
confidence: 60%
See 1 more Smart Citation
“…After performing the second phase meant for filters starting from pairwise correlation coefficients as absolute values (≥0.1), together with their significance (p < 0.001) and support (at least a third of the data or N > 142,150), 19 variables resulted as indicated in Table 1. The same results were more easily achieved using the PCDM command (Stata script at https://tinyurl.com/25pd6mx6, accessed on 30 January 2023) in Stata [73] and three parameters (minacc (0.1) minn (142,150) maxp (0.001)) corresponding to those three filters above. The next concern before going to the third selection step (dedicated to cross-validations on specified criteria) was to recode ("nt" call sign meaning null treatment) the remaining variables (all 19 in Table 1).…”
Section: Resultsmentioning
confidence: 60%
“…The 2nd selection round stood on a set of filters applied. First, they met a minimum threshold of 0.1 [72] for the absolute values of pairwise correlation coefficients [73] between each recoded variable from the previous step and the one that was to be analyzed. In addition, there was a minimum value of the corresponding significance (min p = 0.001) and a minimum support afferent to a minimum number of valid observations (at least a third of the total number) for each pair.…”
Section: Methodsmentioning
confidence: 99%
“…Multicollinearity is considered a phenomenon that supports the model of regression That is utilised through the support of the independent variable. However, all these factors help to generate the accurate outcome of the research (Homocianu & Airinei, 2022). The concerning factor is mainly pointed out by the values that lie between -1 and 1 and r. All these factors initially display the negative interrelationship then it shows the positive interrelationship that can be denoted by different numerical factors.…”
Section: Descriptive Statistics For the Study Variablesmentioning
confidence: 99%