Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.
We describe an information system architecture for the ACES (Asia-Pacific Cooperation for Earthquake Simulation) community. It addresses several key features of the fieldsimulations at multiple scales that need to be coupled together; real-time and archival observational data, which needs to be analyzed for patterns and linked to the simulations; a variety of important algorithms including partial differential equation solvers, particle dynamics, signal processing and data analysis; a natural three dimensional space (plus time) setting for both visualization and observations; the linkage of field to real-time events both as an aid to crisis management and to scientific discovery. We also address the need to support education and research for a field whose computational sophistication is increasing rapidly and spans a broad range. The information system assumes that all significant data is defined by an XML layer which could be virtual but whose existence ensures that all data is object-based and can be accessed and searched in this form. The various capabilities needed by ACES are defined as Grid Services, which are conformant with emerging standards and implemented with different levels of fidelity and performance appropriate for the application. Grid Services can be composed in a hierarchical fashion to address complex problems. The real-time needs of the field are addressed by high performance implementation of data transfer and simulation services; further the environment is linked to real-time collaboration to support interactions between scientists in geographically distant locations.
ACES Grid and .opennet Grid ArchitectureWe consider an ACES [1] computational environment (ACESCE) built in terms of a web-based user interfaces accessing services, which are built in a broker-based fashion [2]. The client machine contacts a server that acts as an intermediary to back-end resources and also as a conduit for clients to access services. One can also view the brokers as middleware wrappers that allow a heterogeneous collection of resources to be accessed in a relatively uniform fashion. In the simplest technology, these brokers or wrappers would be implemented as a Perl CGI program running on a web server. As discussed later, there are more sophisticated approaches but the basic model is correct;
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.