The feature selection is an important challenge in many areas of machine learning because it plays a crucial role in the interpretations of machine-driven decisions. There are various approaches to the feature selection problem and methods based on the information theory comprise an important group. Here, the minimum redundancy maximum relevance (mRMR) feature selection is undoubtedly the most popular one with widespread application. In this paper, we prove in contrast to an existing finding that the mRMR is not equivalent to Max-Dependency criterion for first-order incremental feature selection. We present another form of equivalence leading to a generalization of mRMR feature selection. Additionally, we compare several feature selection methods based on mRMR, Max-Dependency, and feature ranking, employing different measures of dependency. The results on high-dimensional real-world datasets show that the distance correlation is the suitable measure for dependency-based feature selection methods. The results also indicate that the Max-Dependency incremental algorithm combined with distance correlation appears to be a promising feature selection approach.
Objective: Within the PhysioNet/Computing in Cardiology Challenge 2021, we focused on the design of a machine learning algorithm to identify cardiac abnormalities from electrocardiogram recordings (ECGs) with a various number of leads and to assess the diagnostic potential of reduced-lead ECGs compared to standard 12-lead ECGs. Approach: In our solution, we developed a model based on a deep convolutional neural network, which is a 1D variant of the popular ResNet50 network. This base model was pre-trained on a large training set with our proposed mapping of original labels to SNOMED codes, using three-valued labels. In the next phase, the model was fine-tuned for the Challenge metric and conditions. Main results: In the Challenge, our proposed approach (team CeZIS) achieved a Challenge test score of 0.52 for all lead configurations, placing us 5th out of 39 in the official ranking. Our improved post-Challenge solution was evaluated as the best for all ranked configurations, i.e., for 12-lead, 3-lead, and 2-lead versions of the full test set with the Challenge test score of 0.62, 0.61, and 0.59, respectively. Significance: In addition to building the model for identifying cardiac anomalies, we provide a more detailed description of the issues associated with label mapping and propose its modification in order to obtain a better starting point for training more powerful classification models. We compare the performance of models for different numbers of leads and identify labels for which two leads are sufficient. Moreover, we evaluate the label quality in individual parts of the Challenge training set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.