Xiaofeng Zhu scite author profile

nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed value (even though set by experts) to all test samples. Previous solutions assign different values to different test samples by the cross validation method but are usually time-consuming. This paper proposes a kTree method to learn different optimal values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal values. In the test stage, the kTree fast outputs the optimal value for each test sample, and then, the kNN classification can be conducted using the learned optimal value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different values to different test samples. This paper further proposes an improvement version of kTree method (namely, k*Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k*Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k*Tree) are much more efficient than the compared methods in terms of classification tasks.

show abstract

A planning quality evaluation tool for prostate adaptive IMRT based on machine learning

Zhu

et al. 2011

Medical Physics

294

350

View full text Add to dashboard Cite

show abstract

Effectiveness of robust optimization in intensity‐modulated proton therapy planning for head and neck cancers

et al. 2013

View full text Add to dashboard Cite

Purpose: Intensity-modulated proton therapy (IMPT) is highly sensitive to uncertainties in beam range and patient setup. Conventionally, these uncertainties are dealt using geometrically expanded planning target volume (PTV). In this paper, the authors evaluated a robust optimization method that deals with the uncertainties directly during the spot weight optimization to ensure clinical target volume (CTV) coverage without using PTV. The authors compared the two methods for a population of head and neck (H&N) cancer patients. Methods: Two sets of IMPT plans were generated for 14 H&N cases, one being PTV-based conventionally optimized and the other CTV-based robustly optimized. For the PTV-based conventionally optimized plans, the uncertainties are accounted for by expanding CTV to PTV via margins and delivering the prescribed dose to PTV. For the CTV-based robustly optimized plans, spot weight optimization was guided to reduce the discrepancy in doses under extreme setup and range uncertainties directly, while delivering the prescribed dose to CTV rather than PTV. For each of these plans, the authors calculated dose distributions under various uncertainty settings. The root-mean-square dose (RMSD) for each voxel was computed and the area under the RMSD-volume histogram curves (AUC) was used to relatively compare plan robustness. Data derived from the dose volume histogram in the worst-case and nominal doses were used to evaluate the plan optimality. Then the plan evaluation metrics were averaged over the 14 cases and were compared with two-sided paired t tests. Results: CTV-based robust optimization led to more robust (i.e., smaller AUCs) plans for both targets and organs. Under the worst-case scenario and the nominal scenario, CTV-based robustly optimized plans showed better target coverage (i.e., greater D 95% ), improved dose homogeneity (i.e., smaller D 5% − D 95% ), and lower or equivalent dose to organs at risk. Conclusions: CTV-based robust optimization provided significantly more robust dose distributions to targets and organs than PTV-based conventional optimization in H&N using IMPT. Eliminating the use of PTV and planning directly based on CTV provided better or equivalent normal tissue sparing.

show abstract

Learning k for kNN Classification

Zhang

Zong

et al. 2017

ACM Trans. Intell. Syst. Technol.

408

148

View full text Add to dashboard Cite

The K Nearest Neighbor (kNN) method has widely been used in the applications of data mining and machine learning due to its simple implementation and distinguished performance. However, setting all test data with the same k value in the previous kNN methods has been proven to make these methods impractical in real applications. This article proposes to learn a correlation matrix to reconstruct test data points by training data to assign different k values to different test data points, referred to as the Correlation Matrix kNN (CM-kNN for short) classification. Specifically, the least-squares loss function is employed to minimize the reconstruction error to reconstruct each test data point by all training data points. Then, a graph Laplacian regularizer is advocated to preserve the local structure of the data in the reconstruction process. Moreover, an 1-norm regularizer and an 2,1-norm regularizer are applied to learn different k values for different test data and to result in low sparsity to remove the redundant/noisy feature from the reconstruction process, respectively. Besides for classification tasks, the kNN methods (including our proposed CM-kNN method) are further utilized to regression and missing data imputation. We conducted sets of experiments for illustrating the efficiency, and experimental results showed that the proposed method was more accurate and efficient than existing kNN methods in data-mining applications, such as classification, regression, and missing data imputation.

show abstract

A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis

2014

View full text Add to dashboard Cite

Recent studies on AD/MCI diagnosis have shown that the tasks of identifying brain disease and predicting clinical scores are highly related to each other. Furthermore, it has been shown that feature selection with a manifold learning or a sparse model can handle the problems of high feature dimensionality and small sample size. However, the tasks of clinical score regression and clinical label classification were often conducted separately in the previous studies. Regarding the feature selection, to our best knowledge, most of the previous work considered a loss function defined as an element-wise difference between the target values and the predicted ones. In this paper, we consider the problems of joint regression and classification for AD/MCI diagnosis and propose a novel matrix-similarity based loss function that uses high-level information inherent in the target response matrix and imposes the information to be preserved in the predicted response matrix. The newly devised loss function is combined with a group lasso method for joint feature selection across tasks, i.e., predictions of clinical scores and a class label. In order to validate the effectiveness of the proposed method, we conducted experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and showed that the newly devised loss function helped enhance the performances of both clinical score prediction and disease status identification, outperforming the state-of-the-art methods.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaofeng Zhu

Efficient kNN Classification With Different Numbers of Nearest Neighbors

A planning quality evaluation tool for prostate adaptive IMRT based on machine learning

Effectiveness of robust optimization in intensity‐modulated proton therapy planning for head and neck cancers

Learning k for kNN Classification

A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis

Contact Info

Product

Resources

About