Word sense induction (WSI), which addresses polysemy by unsupervised discovery of multiple word senses, resolves ambiguities for downstream NLP tasks and also makes word representations more interpretable. This paper proposes an accurate and efficient graphbased method for WSI that builds a global non-negative vector embedding basis (which are interpretable like topics) and clusters the basis indexes in the ego network of each polysemous word. By adopting distributional inclusion vector embeddings as our basis formation model, we avoid the expensive step of nearest neighbor search that plagues other graph-based methods without sacrificing the quality of sense clusters. Experiments on three datasets show that our proposed method produces similar or better sense clusters and embeddings compared with previous state-of-theart methods while being significantly more efficient.
With current economic realities, now is the time to produce "more with less". Exception Based Surveillance (EBS) helps eliminate waste in an engineer's day, by removing unnecessary analysis and allowing the engineer to focus on the highest value tasks. For the surveillance of oil wells, an efficient and effective way to achieve this is to use an automated Exception Based Surveillance System, often augmented by data-driven well rate estimates. Specifically, this paper provides examples from Shell operations in Gabon, Malaysia, and the Netherlands on how EBS systems have been set up to address day to day production challenges. The multiple EBS Systems to be described here have been achieved via the tight integration of real-time data in Well, Reservoir and Facilities management (WRFM) workflows and the automation of complex calculations and rule sets. This paper also describes the WRFM "Next Generation surveillance tool" (NGT) currently being rolled out in several Shell assets (Clinton 2016). The work described here regarding enhanced Exception Based Surveillance Systems and integrated Portals go beyond just deploying tools. To be sustainable and value adding over existing practices, the introduction of these systems requires the transformation of roles, processes and tools to fully and efficiently leverage and gain value from now mature
Most unsupervised NLP models represent each word with a single point or single region in semantic space, while the existing multi-sense word embeddings cannot represent longer word sequences like phrases or sentences. We propose a novel embedding method for a text sequence (a phrase or a sentence) where each sequence is represented by a distinct set of multi-mode codebook embeddings to capture different semantic facets of its meaning. The codebook embeddings can be viewed as the cluster centers which summarize the distribution of possibly co-occurring words in a pre-trained word embedding space. We introduce an end-to-end trainable neural model that directly predicts the set of cluster centers from the input text sequence during test time. Our experiments show that the per-sentence codebook embeddings significantly improve the performances in unsupervised sentence similarity and extractive summarization benchmarks. In phrase similarity experiments, we discover that the multi-facet embeddings provide an interpretable semantic representation but do not outperform the single-facet baseline.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.