Sung-Hyon Myaeng scite author profile

How can we find patterns from an enormous graph with billions of vertices and edges? The subgraph enumeration, which is to find patterns from a graph, is an important task for graph data analysis with many applications, including analyzing the social network evolution, measuring the significance of motifs in biological networks, observing the dynamics of Internet, and so on. Especially, the triangle enumeration, a special case of the subgraph enumeration, where the pattern is a triangle, has many applications such as identifying suspicious users in social networks, detecting web spams, and finding communities. However, recent networks are so large that most of the previous algorithms fail to process them. Recently, several MapReduce algorithms have been proposed to address such large networks; however, they suffer from the massive shuffled data resulting in a very long processing time. In this article, we propose scalable methods for enumerating trillion subgraphs on distributed systems. We first propose PTE ( Pre-partitioned Triangle Enumeration ), a new distributed algorithm for enumerating triangles in enormous graphs by resolving the structural inefficiency of the previous MapReduce algorithms. PTE enumerates trillions of triangles in a billion scale graph by decreasing three factors: the amount of shuffled data, total work, and network read. We also propose PSE ( Pre-partitioned Subgraph Enumeration ), a generalized version of PTE for enumerating subgraphs that match an arbitrary query graph. Experimental results show that PTE provides 79 times faster performance than recent distributed algorithms on real-world graphs, and succeeds in enumerating more than 3 trillion triangles on the ClueWeb12 graph with 6.3 billion vertices and 72 billion edges. Furthermore, PSE successfully enumerates 265 trillion clique subgraphs with 4 vertices from a subdomain hyperlink network, showing 47 times faster performance than the state of the art distributed subgraph enumeration algorithm.

show abstract

Domain-specific sentiment analysis using contextual feature generation

Choi

Kim

Myaeng

2009

View full text Add to dashboard Cite

This paper presents a novel framework for sentiment analysis, which exploits sentiment topic information for generating contextdriven features. Since the domain-specific nature of sentiment classification led the task more problematic, considering more contextual-information such as topic or domain is essential. In our system, we first automatically extract sentiment clues in different domains by our observation. We identified that a sentiment clue is often syntactically related to a sentiment topic in a sentence, which is defined as a primary subject of sentiment expression, such as event, company, and person. We bootstrap from a small set of seed clues and generate new clues by utilizing linguistic dependencies and collocation information between sentiment clues and sentiment topics. Next, we learn a domain-specific sentiment classifier for each domain with the newly aggregated clues. We ran experiments to see how the bootstrapping algorithm to converge and aggregate new clues and verified that the extracted domain-context features are more effective than generally-used features in sentiment analysis by running them on the same sentiment classifier.

show abstract

Automatic discovery of technology trends from patent text

Kim

Tian

Jeong

et al. 2009

View full text Add to dashboard Cite

Patent text is a rich source to discover technological progresses, useful to understand the trend and forecast upcoming advances. For the importance in mind, several researchers have attempted textual-data mining from patent documents. However, previous mining methods are limited in terms of readability, domainexpertise, and adaptability. In this paper, we first formulate the task of technological trend discovery and propose a method for discovering such a trend. We complement a probabilistic approach by adopting linguistic clues and propose an unsupervised procedure to discover technological trends. Based on the experiment, our method is promising not only in its accuracy, 77% in R-precision, but also in its functionality and novelty of discovering meaningful technological trends.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sung-Hyon Myaeng

Identifying Controversial Issues and Their Sub-topics in News Articles

Enumerating Trillion Subgraphs On Distributed Systems

Domain-specific sentiment analysis using contextual feature generation

Automatic discovery of technology trends from patent text

Contact Info

Product

Resources

About