a b s t r a c tWe show that the recently introduced label propagation method for detecting communities in complex networks is equivalent to finding the local minima of a simple Potts model. Applying to empirical data, the number of such local minima was found to be very high, much larger than the number of nodes in the graph. The aggregation method for combining information from more local minima shows a tendency to fragment the communities into very small pieces.
Tagging items with descriptive annotations or keywords is a very natural way to compress and highlight information about the properties of the given entity. Over the years several methods have been proposed for extracting a hierarchy between the tags for systems with a "flat", egalitarian organization of the tags, which is very common when the tags correspond to free words given by numerous independent people. Here we present a complete framework for automated tag hierarchy extraction based on tag occurrence statistics. Along with proposing new algorithms, we are also introducing different quality measures enabling the detailed comparison of competing approaches from different aspects. Furthermore, we set up a synthetic, computer generated benchmark providing a versatile tool for testing, with a couple of tunable parameters capable of generating a wide range of test beds. Beside the computer generated input we also use real data in our studies, including a biological example with a pre-defined hierarchy between the tags. The encouraging similarity between the pre-defined and reconstructed hierarchy, as well as the seemingly meaningful hierarchies obtained for other real systems indicate that tag hierarchy extraction is a very promising direction for further research with a great potential for practical applications. Tags have become very prevalent nowadays in various online platforms ranging from blogs through scientific publications to protein databases. Furthermore, tagging systems dedicated for voluntary tagging of photos, films, books, etc. with free words are also becoming popular. The emerging large collections of tags associated with different objects are often referred to as folksonomies, highlighting their collaborative origin and the “flat” organization of the tags opposed to traditional hierarchical categorization. Adding a tag hierarchy corresponding to a given folksonomy can very effectively help narrowing or broadening the scope of search. Moreover, recommendation systems could also benefit from a tag hierarchy.
Academic journals are the repositories of mankind's gradually accumulating knowledge of the surrounding world. Just as knowledge is organized into classes ranging from major disciplines, subjects and fields, to increasingly specific topics, journals can also be categorized into groups using various metric. In addition, they can be ranked according to their overall influence. However, according to recent studies, the impact, prestige and novelty of journals cannot be characterized by a single parameter such as, for example, the impact factor. To increase understanding of journal impact, the knowledge gap we set out to explore in our study is the evaluation of journal relevance using complex multi-dimensional measures. Thus, for the first time, our objective is to organize journals into multiple hierarchies based on citation data. The two approaches we use are designed to address this problem from different perspectives. We use a measure related to the notion of m-reaching centrality and find a network that shows a journal's level of influence in terms of the direction and efficiency with which information spreads through the network. We find we can also obtain an alternative network using a suitably modified nested hierarchy extraction method applied to the same data. In this case, in a self-organized way, the journals become branches according to the major scientific fields, where the local structure of the branches reflect the hierarchy within the given field, with usually the most prominent journal (according to other measures) in the field chosen by the algorithm as the local root, and more specialized journals positioned deeper in the branch. This can make the navigation within different scientific fields and subfields very simple, and equivalent to navigating in the different branches of the nested hierarchy. We expect this to be particularly helpful, for example, when choosing the most appropriate journal for a given manuscript. According to our results, the two alternative hierarchies show a somewhat different, but also consistent, picture of the intricate relations between scientific journals, and, as such, they also provide a new perspective on how scientific knowledge is organized into networks.
Community detection methods have so far been tested mostly on small empirical networks and on synthetic benchmarks. Much less is known about their performance on large real-world networks, which nonetheless are a significant target for application. We analyze the performance of three state-of-the-art community detection methods by using them to identify communities in a large social network constructed from mobile phone call records. We find that all methods detect communities that are meaningful in some respects but fall short in others, and that there often is a hierarchical relationship between communities detected by different methods. Our results suggest that community detection methods could be useful in studying the general mesoscale structure of networks, as opposed to only trying to identify dense structures.
We investigate how in complex systems the eigenpairs of the matrices derived from the correlations of multichannel observations reflect the cluster structure of the underlying networks. For this we use daily return data from the NYSE and focus specifically on the spectral properties of weight W_{ij} = |C|_{ij} - \delta_{ij} and diffusion matrices D_{ij} = W_{ij}/s_j- \delta_{ij}, where C_{ij} is the correlation matrix and s_i = \sum_j W_{ij} the strength of node j. The eigenvalues (and corresponding eigenvectors) of the weight matrix are ranked in descending order. In accord with the earlier observations the first eigenvector stands for a measure of the market correlations. Its components are to first approximation equal to the strengths of the nodes and there is a second order, roughly linear, correction. The high ranking eigenvectors, excluding the highest ranking one, are usually assigned to market sectors and industrial branches. Our study shows that both for weight and diffusion matrices the eigenpair analysis is not capable of easily deducing the cluster structure of the network without a priori knowledge. In addition we have studied the clustering of stocks using the asset graph approach with and without spectrum based noise filtering. It turns out that asset graphs are quite insensitive to noise and there is no sharp percolation transition as a function of the ratio of bonds included, thus no natural threshold value for that ratio seems to exist. We suggest that these observations can be of use for other correlation based networks as well.Comment: 26 pages, 14 figure
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.