Many social Web sites allow users to publish content and annotate with descriptive metadata. In addition to flat tags, some social Web sites have recently began to allow users to organize their content and metadata hierarchically. The social photosharing site Flickr, for example, allows users to group related photos in sets, and related sets in collections. The social bookmarking site Del.icio.us similarly lets users group related tags into bundles. Although the sites themselves don't impose any constraints on how these hierarchies are used, individuals generally use them to capture relationships between concepts, most commonly the broader/narrower relations. Collective annotation of content with hierarchical relations may lead to an emergent classification system, called a folksonomy. While some researchers have explored using tags as evidence for learning folksonomies, we believe that hierarchical relations described above offer a high-quality source of evidence for this task. We propose a simple approach to aggregate shallow hierarchies created by many distinct Flickr users into a common folksonomy. Our approach uses statistics to determine if a particular relation should be retained or discarded. The relations are then woven together into larger hierarchies. Although we have not carried out a detailed quantitative evaluation of the approach, it looks very promising since it generates very reasonable, non-trivial hierarchies.
Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured metadata provide valuable evidence for learning how a community organizes knowledge. For instance, we can aggregate many personal hierarchies into a common taxonomy, also known as a folksonomy, that will aid users in visualizing and browsing social content, and also to help them in organizing their own content. However, learning from social metadata presents several challenges, since it is sparse, shallow, ambiguous, noisy, and inconsistent. We describe an approach to folksonomy learning based on relational clustering, which exploits structured metadata contained in personal hierarchies. Our approach clusters similar hierarchies using their structure and tag statistics, then incrementally weaves them into a deeper, bushier tree. We study folksonomy learning using social metadata extracted from the photosharing site Flickr, and demonstrate that the proposed approach addresses the challenges. Moreover, comparing to previous work, the approach produces larger, more accurate folksonomies, and in addition, scales better.
Learning structured representations has emerged as an important problem in many domains, including document and Web data mining, bioinformatics, and image analysis. One approach to learning complex structures is to integrate many smaller, incomplete and noisy structure fragments. In this work, we present an unsupervised probabilistic approach that extends affinity propagation [7] to combine the small ontological fragments into a collection of integrated, consistent, and larger folksonomies. This is a challenging task because the method must aggregate similar structures while avoiding structural inconsistencies and handling noise. We validate the approach on a realworld social media dataset, comprised of shallow personal hierarchies specified by many individual users, collected from the photosharing website Flickr. Our empirical results show that our proposed approach is able to construct deeper and denser structures, compared to an approach using only the standard affinity propagation algorithm. Additionally, the approach yields better overall integration quality than a state-of-the-art approach based on incremental relational clustering.
Given a dataset of careers and incomes, how large a difference of incomes between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode A instead of B to travel? In this paper, we propose a framework that is able to infer orders of categories as well as magnitudes of difference of real numbers between each pair of categories using an estimation statistics framework. Our framework not only reports whether an order of categories exists, but it also reports magnitudes of difference of each consecutive pair of categories in the order. In a large dataset, our framework is scalable well compared with existing frameworks. The proposed framework has been applied to two real-world case studies: 1) ordering careers by incomes from 350,000 households living in Khon Kaen province, Thailand, and 2) ordering sectors by closing prices from 1,060 companies in NASDAQ stock market between years 2000 and 2016. The results of careers ordering demonstrate income inequality among different careers. The stock market results illustrate dynamics of sector domination that can change over time. Our approach is able to be applied in any research area that has category-real pairs. Our proposed Dominant-Distribution Network provides a novel approach to gain new insight of analyzing category orders. A software of this framework is available for researchers or practitioners in an R CRAN package: EDOIF.
Recently, there has been a great deal of interest in analyzing inherent structures in posts on microblogs such as Twitter. While many works utilize a well-known topic modeling technique, we instead propose to apply Affinity Propagation [4] (AP) to analyze such a corpus, and we hypothesize that AP may provide different perspective to the traditional approach. Our preliminary analysis raises some interesting facts and issues, which suggest future research directions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.