Peer Data Management Systems (PDMSs) are advanced P2P applications in which each peer represents an autonomous data source making available an exported schema to be shared with other peers. Query answering in PDMSs can be improved if peers are efficiently disposed in the overlay network according to the similarity of their content. The set of peers can be partitioned into clusters, so as the semantic similarity among the peers participating into the same cluster is maximal. The creation and maintenance of clusters is a challenging problem in the current stage of development of PDMSs. This work proposes an incremental peer clustering process. The authors present a PDMS architecture designed to facilitate the connection of new peers according to their exported schema described by an ontology. The authors propose a clustering process and the underlying algorithm. The authors present and discuss some experimental results on peer clustering using the approach.
This work was supported by the European Commission through the Cooperation Programme under EUBra-BIGSEA Horizon 2020 Grant [Este projeto é resultante da 3a Chamada Coordenada BR-UE em Tecnologias da Informação e Comunicação (TIC), anunciada pelo Ministério de Ciência, Tecnologia e Inovação (MCTI)] under Grant 690116.
The lack of hierarchical relations in the tag space of social tagging systems may diminish the ability of users to find relevant resources. Many research works propose to overcome this problem by constructing hierarchies of tags automatically by means of heuristic algorithms. These hierarchies encode subsumption relations between pairs of tags and can be used for improving browsing and retrieval of resources. In this paper, we cast the problem of subsumption detection between pairs of tags as a pairwise classification problem. From the literature, we identified several similarity measures that are good indicators of subsumption, which are used as learning features. Under this setting, we observed severe class imbalance and class overlapping which motivated us to investigate and employ class imbalance techniques to overcome these problems. We conducted a comprehensive set of experiments on a large real-world dataset, showing that our approach outperforms the best performing heuristic-based baseline.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.