When dealing with datasets comprising high-dimensional points, it is usually advantageous to discover some data structure. A fundamental information needed to this aim is the minimum number of parameters required to describe the data while minimizing the information loss. This number, usually called intrinsic dimension, can be interpreted as the dimension of the manifold from which the input data are supposed to be drawn. Due to its usefulness in many theoretical and practical problems, in the last decades the concept of intrinsic dimension has gained considerable attention in the scientific community, motivating the large number of intrinsic dimensionality estimators proposed in the literature. However, the problem is still open since most techniques cannot efficiently deal with datasets drawn from manifolds of high intrinsic dimension and nonlinearly embedded in higher dimensional spaces. This paper surveys some of the most interesting, widespread used, and advanced state-of-the-art methodologies. Unfortunately, since no benchmark database exists in this research field, an objective comparison among different techniques is not possible. Consequently, we suggest a benchmark framework and apply it to comparatively evaluate relevant stateof-the-art estimators.
In the past decades, a great deal of research work has been devoted to the development of systems that could improve radiologists' accuracy in detecting lung nodules. Despite the great efforts, the problem is still open. In this paper, we present a fully automated system processing digital postero-anterior (PA) chest radiographs, that starts by producing an accurate segmentation of the lung field area. The segmented lung area includes even those parts of the lungs hidden behind the heart, the spine, and the diaphragm, which are usually excluded from the methods presented in the literature. This decision is motivated by the fact that lung nodules may be found also in these areas. The segmented area is processed with a simple multiscale method that enhances the visibility of the nodules, and an extraction scheme is then applied to select potential nodules. To reduce the high number of false positives extracted, cost-sensitive support vector machines (SVMs) are trained to recognize the true nodules. Different learning experiments were performed on two different data sets, created by means of feature selection, and employing Gaussian and polynomial SVMs trained with different parameters; the results are reported and compared. With the best SVM models, we obtain about 1.5 false positives per image (fp/image) when sensitivity is approximately equal to 0.71; this number increases to about 2.5 and 4 fp/image when sensitivity is = 0.78 and = 0.85, respectively. For the highest sensitivity (= 0.92 and 1.0), we get 7 or 8 fp/image.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.