Abstract. Classifications have been used for centuries with the goal of cataloguing and searching large sets of objects. In the early days it was mainly books; lately it has also become Web pages, pictures and any kind of digital resources. Classifications describe their contents using natural language labels, an approach which has proved very effective in manual classification. However natural language labels show their limitations when one tries to automate the process, as they make it very hard to reason about classifications and their contents. In this paper we introduce the novel notion of Formal Classification, as a graph structure where labels are written in a propositional concept language. Formal Classifications turn out to be some form of lightweight ontologies. This, in turn, allows us to reason about them, to associate to each node a normal form formula which univocally describes its contents, and to reduce document classification and query answering to reasoning about subsumption.
-Based on an effective clustering algorithm -Affinity Propagation (AP) -we present in this paper a novel semi-supervised text-clustering algorithm, called Seeds Affinity Propagation (SAP). There are two main contributions in our approach: (1) a new similarity metric that captures the structural information of texts; (2) a novel seed construction method to improve the semi-supervised clustering process. To study the performance of the new algorithm, we applied it to the benchmark data set Reuters-21578, and compared it to two state-of-the-art clustering algorithms, namely k-means algorithm and the original AP algorithm. Furthermore, we have analyzed the individual impact of the two proposed contributions. Results show that the proposed similarity metric is more effective in text clustering (F-measures ca. 21% higher than in the AP algorithm) and that the proposed semi-supervised strategy achieves both better clustering results and faster convergence (using only 76% iterations of the original AP). The complete SAP algorithm obtains higher F-measure (ca. 40% improvement over k-means and AP) and lower entropy (ca. 28 % decrease over k-means and AP), improves significantly clustering execution time (twenty time faster) in respect than k-means, and provides enhanced robustness compared with all other methods.
In this paper we review several novel approaches for research evaluation. We start with a brief overview of the peer review, its controversies, and metrics for assessing efficiency and overall quality of the peer review. We then discuss five approaches, including reputation-based ones, that come out of the research carried out by the LiquidPub project and research groups collaborated with LiquidPub. Those approaches are alternative or complementary to traditional peer review. We discuss pros and cons of the proposed approaches and conclude with a vision for the future of the research evaluation, arguing that no single system can suit all stakeholders in various communities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.