Density clustering has been widely used in many research disciplines to determine the structure of real-world datasets. Existing density clustering algorithms only work well on complete datasets. In real-world datasets, however, there may be missing feature values due to technical limitations. Many imputation methods used for density clustering cause the aggregation phenomenon. To solve this problem, a two-stage novel density peak clustering approach with missing features is proposed: First, the density peak clustering algorithm is used for the data with complete features, while the labeled core points that can represent the whole data distribution are used to train the classifier. Second, we calculate a symmetrical FWPD distance matrix for incomplete data points, then the incomplete data are imputed by the symmetrical FWPD distance matrix and classified by the classifier. The experimental results show that the proposed approach performs well on both synthetic datasets and real datasets.
Currently, a large number of Web information on the Internet is presented in structured objects. Mining object information from Web is of great importance for Web data management. This paper presents a Web object block mining method based on tag similarity. It first constructs a DOM tree for the Web page and calculates the similarity of all possible generalized nodes. Then a pruning method is used to filter the redundant information based on the features of noise data and find the Web object region. Finally the Web objects are identified in the Web object region. The experiment results show that, comparing to IEPAD, our method got a higher precision.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.