Word embedding, a lexical vector representation generated via the neural linguistic model (NLM), is empirically demonstrated to be appropriate for improvement of the performance of traditional language model. However, the supreme dimensionality that is inherent in NLM contributes to the problems of hyperparameters and long-time training in modeling. Here, we propose a force-directed method to improve such problems for simplifying the generation of word embedding. In this framework, each word is assumed as a point in the real world; thus it can approximately simulate the physical movement following certain mechanics. To simulate the variation of meaning in phrases, we use the fracture mechanics to do the formation and breakdown of meaning combined by a 2-gram word group. With the experiments on the natural linguistic tasks of part-of-speech tagging, named entity recognition and semantic role labeling, the result demonstrated that the 2-dimensional word embedding can rival the word embeddings generated by classic NLMs, in terms of accuracy, recall, and text visualization.
This paper investigates a homotopy-based method for embedding with hundreds of thousands of data items that yields a parallel algorithm suitable for running on a distributed system. Current eigenvalue-based embedding algorithms attempt to use a sparsification of the distance matrix to approximate a low-dimensional representation when handling large-scale data sets. The main reason of taking approximation is that it is still hindered by the eigendecomposition bottleneck for high-dimensional matrices in the embedding process. In this study, a homotopy continuation algorithm is applied for improving this embedding model by parallelizing the corresponding eigendecomposition. The eigenvalue solution is converted to the operation of ordinary differential equations with initialized values, and all isolated positive eigenvalues and corresponding eigenvectors can be obtained in parallel according to predicting eigenpaths. Experiments on the real data sets show that the homotopy-based approach is potential to be implemented for millions of data sets.
Abstract:Open data sources regarding conflicts are increasingly enriched by broad social media; these yield a volume of information that exceeds our process capabilities. One of the critical factors is that knowledge extraction from mixed data formats requires systematic, sophisticated modeling. Here, we propose using text mining modeling tools for building associations of heterogeneous semi-structured data to enhance decision-making. Using narrative plots, text representation, and cluster analysis, we provide a data association framework that can mine spatiotemporal data that occur in similar contexts. The framework contains the following steps: (1) a novel text representation is presented to vectorize the textual semantics by learning both co-word features and word orders in a unified form; (2) text clustering technology is employed to associate events of interest with similar events in historical logs, based solely on narrative plots of the events; and (3) the inferred activity procedure is visualized via an evolving spatiotemporal map through the Kriging algorithm. Our results demonstrate that the approach enables deeper discrimination into the trends underlying conflicts and possesses a narrative reasoning forward prediction with a precision of 0.4817, in addition to a high consistency with the conclusions of existing studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.