This study investigates the effect of class imbalance in training data when developing neural network classifiers for computer aided medical diagnosis. The investigation is performed in the presence of other characteristics that are typical among medical data, namely small training sample size, large number of features, and correlations between features. Two methods of neural network training are explored: classical backpropagation (BP) and particle swarm optimization (PSO) with clinically relevant training criteria. An experimental study is performed using simulated data and the conclusions are further validated on real clinical data for breast cancer diagnosis. The results show that classifier performance deteriorates with even modest class imbalance in the training data. Further, it is shown that BP is generally preferable over PSO for imbalanced training data especially with small data sample and large number of features. Finally, it is shown that there is no clear preference between oversampling and no compensation approach and some guidance is provided regarding a proper selection.
The purpose of this study was to investigate an information theoretic approach to feature selection for computer-aided diagnosis (CAD). The approach is based on the mutual information (MI) concept. MI measures the general dependence of random variables without making any assumptions about the nature of their underlying relationships. Consequently, MI can potentially offer some advantages over feature selection techniques that focus only on the linear relationships of variables. This study was based on a database of statistical texture features extracted from perfusion lung scans. The ultimate goal was to select the optimal subset of features for the computer-aided diagnosis of acute pulmonary embolism (PE). Initially, the study addressed issues regarding the approximation of MI in a limited dataset as it is often the case in CAD applications. The MI selected features were compared to those features selected using stepwise linear discriminant analysis and genetic algorithms for the same PE database. Linear and nonlinear decision models were implemented to merge the selected features into a final diagnosis. Results showed that the MI is an effective feature selection criterion for nonlinear CAD models overcoming some of the well-known limitations and computational complexities of other popular feature selection techniques in the field.
The initial process for creating a flexible three-dimensional computer-generated breast phantom based on empirical data is described. Dedicated breast computed-tomography data were processed to suppress noise and scatter artifacts in the reconstructed image set. An automated algorithm was developed to classify the breast into its primary components. A preliminary phantom defined using subdivision surfaces was generated from the segmented data. To demonstrate potential applications of the phantom, simulated mammographic image data were acquired of the phantom using a simplistic compression model and an analytic projection algorithm directly on the surface model. The simulated image was generated using a model for a polyenergetic cone-beam projection of the compressed phantom. The methods used to create the breast phantom generate resulting images that have a high level of tissue structure detail available and appear similar to actual mammograms. Fractal dimension measurements of simulated images of the phantom are comparatively similar to measurements from images of real human subjects. A realistic and geometrically defined breast phantom that can accurately simulate imaging data may have many applications in breast imaging research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.