2010
DOI: 10.1038/tpj.2010.56
|View full text |Cite
|
Sign up to set email alerts
|

k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction

Abstract: In the clinical application of genomic data analysis and modeling, a number of factors contribute to the performance of disease classification and clinical outcome prediction. This study focuses on the k-nearest neighbor (KNN) modeling strategy and its clinical use. Although KNN is simple and clinically appealing, large performance variations were found among experienced data analysis teams in the MicroArray Quality Control Phase II (MAQC-II) project. For clinical end points and controls from breast cancer, ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
70
0
2

Year Published

2011
2011
2019
2019

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 111 publications
(73 citation statements)
references
References 45 publications
1
70
0
2
Order By: Relevance
“…The most common metric used in Bioinformatics is the absolute Pearson coefficient. For clinical end points and controls from breast cancer, neuroblastoma, and multiple myeloma, authors in [38] generated 463,320 kNN models by varying feature ranking method, a number of features, distance metric, a number of neighbors, vote weighting and decision threshold. They identified factors that contribute to the MAQC-II project performance variation.…”
Section: Classification Techniquesmentioning
confidence: 99%
“…The most common metric used in Bioinformatics is the absolute Pearson coefficient. For clinical end points and controls from breast cancer, neuroblastoma, and multiple myeloma, authors in [38] generated 463,320 kNN models by varying feature ranking method, a number of features, distance metric, a number of neighbors, vote weighting and decision threshold. They identified factors that contribute to the MAQC-II project performance variation.…”
Section: Classification Techniquesmentioning
confidence: 99%
“…Since Parry et al (2010) showed that their three feature ranking methods−significance analysis of microarrays (SAM) d-value, fold change ranking with Pvalue threshold of 0.05, and P-value ranking with fold change threshold of 1.5−performed similarly well in the research of KNN modeling strategy, and our main interest is to find a classifier which predicts well in the cross platform classification, only one method, SAM, is applied to rank the differentiability of genes.…”
Section: Prediction Performance Of the Classifiersmentioning
confidence: 99%
“…The first one called the Training set is meant to be used for the classifier training, while the second one is named Validation set and it is to be used as an independent test set to validate the prediction performance. An additional advantage when analyzing the MAQC data is that a diverse collection of analysis teams has worked on the same data, following the same evaluation procedure and publishing their results 24,22,19 . An accurate benchmark of a new algorithm can be done to understand how well it performs when compared to a considerable number of state of the art alternatives.…”
Section: Datamentioning
confidence: 99%