2014
DOI: 10.3389/fpsyg.2014.00118
|View full text |Cite
|
Sign up to set email alerts
|

Supervised classification in the presence of misclassified training data: a Monte Carlo simulation study in the three group case

Abstract: Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification condi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 7 publications
(10 citation statements)
references
References 66 publications
1
9
0
Order By: Relevance
“…Analyzing groups with different numerical ratios (such as 10:90 or 25:75; Table 2) can significantly impair the discriminant potential of LDA, with higher probability of samples being reclassified within the larger group, and less likely to be classified in the smaller group (Holden et al, 2011;Bolin and Finch, 2014). Thus, we excluded four stands classified in the early stage by CONAMA and Scenario A schemes of the performance analysis and focused our analysis on the discrimination between the intermediate and late stages.…”
Section: Classification Accuracy Assessment: Linear Discriminant Analysismentioning
confidence: 99%
“…Analyzing groups with different numerical ratios (such as 10:90 or 25:75; Table 2) can significantly impair the discriminant potential of LDA, with higher probability of samples being reclassified within the larger group, and less likely to be classified in the smaller group (Holden et al, 2011;Bolin and Finch, 2014). Thus, we excluded four stands classified in the early stage by CONAMA and Scenario A schemes of the performance analysis and focused our analysis on the discrimination between the intermediate and late stages.…”
Section: Classification Accuracy Assessment: Linear Discriminant Analysismentioning
confidence: 99%
“…While no unitary sample size conditions were employed, these effects illustrate a pattern somewhat different from single-level classifiers. That is, previous studies comparing single-level classifiers (e.g., Bolin & Finch, 2014; Holden et al, 2011; Lei & Koehly, 2003) have found that as the sample size increases, estimates tend to become more stable but do not necessarily increase appreciably in accuracy. However, in the multilevel context, it is evident that a higher number of clusters and cases per cluster is associated with noticeable increases in accuracy.…”
Section: Discussionmentioning
confidence: 95%
“…Sela and Simonoff (2012) utilized varied L1 and L2 sample sizes with the former featuring values from 50 to 2,000 and the latter featuring values from 10 to 100; Ngufor et al (2019) also incorporated varied numbers of clusters and time points from 10 to 60 clusters and 1 to 4 time points (within-cluster sample size). Additionally, Bolin and Finch (2014), Holden et al (2011), and Lei and Koehly (2003) each considered varied parameters in single-level classification situations including varied group size ratios and group separation, among other factors, in simulated data. The salient effects of the ICC must also be considered in multilevel contexts, as this is precisely the factor that necessitates use of mixed effects models (with LeBreton & Senter’s [2008] recommendation being any ICC > 0.05 necessitating multilevel models; Raudenbush & Bryk 2002).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations