How can we find patterns from an enormous graph with billions of vertices and edges? The subgraph enumeration, which is to find patterns from a graph, is an important task for graph data analysis with many applications, including analyzing the social network evolution, measuring the significance of motifs in biological networks, observing the dynamics of Internet, and so on. Especially, the triangle enumeration, a special case of the subgraph enumeration, where the pattern is a triangle, has many applications such as identifying suspicious users in social networks, detecting web spams, and finding communities. However, recent networks are so large that most of the previous algorithms fail to process them. Recently, several MapReduce algorithms have been proposed to address such large networks; however, they suffer from the massive shuffled data resulting in a very long processing time. In this article, we propose scalable methods for enumerating trillion subgraphs on distributed systems. We first propose PTE ( Pre-partitioned Triangle Enumeration ), a new distributed algorithm for enumerating triangles in enormous graphs by resolving the structural inefficiency of the previous MapReduce algorithms. PTE enumerates trillions of triangles in a billion scale graph by decreasing three factors: the amount of shuffled data, total work, and network read. We also propose PSE ( Pre-partitioned Subgraph Enumeration ), a generalized version of PTE for enumerating subgraphs that match an arbitrary query graph. Experimental results show that PTE provides 79 times faster performance than recent distributed algorithms on real-world graphs, and succeeds in enumerating more than 3 trillion triangles on the ClueWeb12 graph with 6.3 billion vertices and 72 billion edges. Furthermore, PSE successfully enumerates 265 trillion clique subgraphs with 4 vertices from a subdomain hyperlink network, showing 47 times faster performance than the state of the art distributed subgraph enumeration algorithm.
This paper presents a novel framework for sentiment analysis, which exploits sentiment topic information for generating contextdriven features. Since the domain-specific nature of sentiment classification led the task more problematic, considering more contextual-information such as topic or domain is essential. In our system, we first automatically extract sentiment clues in different domains by our observation. We identified that a sentiment clue is often syntactically related to a sentiment topic in a sentence, which is defined as a primary subject of sentiment expression, such as event, company, and person. We bootstrap from a small set of seed clues and generate new clues by utilizing linguistic dependencies and collocation information between sentiment clues and sentiment topics. Next, we learn a domain-specific sentiment classifier for each domain with the newly aggregated clues. We ran experiments to see how the bootstrapping algorithm to converge and aggregate new clues and verified that the extracted domain-context features are more effective than generally-used features in sentiment analysis by running them on the same sentiment classifier.
Patent text is a rich source to discover technological progresses, useful to understand the trend and forecast upcoming advances. For the importance in mind, several researchers have attempted textual-data mining from patent documents. However, previous mining methods are limited in terms of readability, domainexpertise, and adaptability. In this paper, we first formulate the task of technological trend discovery and propose a method for discovering such a trend. We complement a probabilistic approach by adopting linguistic clues and propose an unsupervised procedure to discover technological trends. Based on the experiment, our method is promising not only in its accuracy, 77% in R-precision, but also in its functionality and novelty of discovering meaningful technological trends.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.