In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two documents use different collections of core words to represent the same topic, they may be falsely assigned to different clusters due to the lack of shared core words, although the core words they use are probably synonyms or semantically associated in other forms. The most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two major issues for this approach: (1) the coverage of the ontology is limited, even for WordNet or Mesh, (2) using ontology terms as replacement or additional features may cause information loss, or introduce noise. In this paper, we present a novel text clustering method to address these two issues by enriching document representation with Wikipedia concept and category information. We develop two approaches, exact match and relatedness-match, to map text documents to Wikipedia concepts, and further to Wikipedia categories. Then the text documents are clustered based on a similarity metric which combines document content information, concept information as well as category information. The experimental results using the proposed clustering framework on three datasets (20-newsgroup, TDT2, and LA Times) show that clustering performance improves significantly by enriching document representation with Wikipedia concepts and categories.
Assessing network vulnerability before potential disruptive events such as natural disasters or malicious attacks is vital for network planning and risk management. It enables us to seek and safeguard against most destructive scenarios in which the overall network connectivity falls dramatically. Existing vulnerability assessments mainly focus on investigating the inhomogeneous properties of graph elements, node degree for example, however, these measures and the corresponding heuristic solutions can provide neither an accurate evaluation over general network topologies, nor performance guarantees to large scale networks. To this end, in this paper, we investigate a measure called pairwise connectivity and formulate this vulnerability assessment problem as a new graph-theoretical optimization problem called β-disruptor, which aims to discover the set of critical node/edges, whose removal results in the maximum decline of the global pairwise connectivity. Our results consist of the NP-Completeness and inapproximability proof of this problem, an O(log n log log n) pseudo-approximation algorithm for detecting the set of critical nodes and an O(log 1.5 n) pseudoapproximation algorithm for detecting the set of critical edges. In addition, we devise an efficient heuristic algorithm and validate the performance of the our model and algorithms through extensive simulations.
Abstract-Minimum-latency beaconing schedule (MLBS) in synchronous multihop wireless networks seeks a schedule for beaconing with the shortest latency. This problem is NP-hard even when the interference radius is equal to the transmission radius. All prior works assume that the interference radius is equal to the transmission radius, and the best-known approximation ratio for MLBS under this special interference model is 7. In this paper, we present a new approximation algorithm called strip coloring for MLBS under the general protocol interference model. Its approximation ratio is at most 5 when the interference radius is equal to transmission radius, and is between 3 and 6 in general.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.