The data are rapidly expanding nowadays, which makes it very difficult to analyze valuable information from big data. Most of the existing data mining algorithms deal with big data problems at large time and space costs. This paper focuses on the sampling problem of big data and puts forward an efficient heuristic Cluster Sampling Arithmetic, called CSA. Many of the former researchers adopted random method to extract early sample set from the original data and then made a variety of different processing of the sample in order to obtain the corresponding minimum sample set, which is regarded as a representation of the original big data set. However, the final processing results of big data will be severely affected by the random sampling process at the beginning, resulting in lower comprehensiveness and quality of the final data results and longer processing time. Based on this view, CSA introduces the idea of clustering to obtain minimum sample set of big data, which is in contrast to the random sampling method in the current literature. CSA makes cluster analysis of the original data set and selects the center of each class as centralized members of the minimum sample set. It aims at ensuring that the sample distribution accords with the characteristics of the original data, guarantees the data integrity and reduces the processing time. The max–min distance means that the pattern recognition has been integrated into the clustering process in order to get the clustering center and prevent algorithm from local optimum. The final experimental results show that, compared with the existing work, CSA algorithm can efficiently reflect the characteristics of the original data and reduce the time of data processing. The obtained minimum sample set has also achieved good effects in the classification algorithm.
Computer-aided detection (CAD) of lobulation can help radiologists to diagnose/detect lung diseases easily and accurately. Compared to CAD of nodule and other lung lesions, CAD of lobulation remained an unexplored problem due to very complex and varying nature of lobulation. Thus, many state-of-the-art methods could not detect successfully. Hence, we revisited classical methods with the capability of extracting undulated characteristics and designed a sliding window based framework for lobulation detection in this paper. Under the designed framework, we investigated three categories of lobulation classification algorithms: template matching, feature based classifier, and bending energy. The resultant detection algorithms were evaluated through experiments on LISS database. The experimental results show that the algorithm based on combination of global context feature and BOF encoding has best overall performance, resulting in F1 score of 0.1009. Furthermore, bending energy method is shown to be appropriate for reducing false positives. We performed bending energy method following the LIOP-LBP mixture feature, the average positive detection per image was reduced from 30 to 22, and F1 score increased to 0.0643 from 0.0599. To the best of our knowledge this is the first kind of work for direct lobulation detection and first application of bending energy to any kind of lobulation work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.