Objectives/Scope we propose a methodology based on document embedding techniques for applying Technology Intelligence Analysis in Oil and Gas (O&G) domain. We build a specialized corpus in O&G domain and train a Vector Space Model (VSM) to represent each document as a vector, in such a way that the distance between two vectors captures their semantic similarity. We explore different analysis on this VSM to infer relations between documents, in order to obtain new insights in a strategic context. Methods Procedures this proposed methodology is based on Natural Language Processing (NLP) techniques to obtain strategic insights in a technology intelligence analysis scenario. It consists on generating a vector space model (VSM) induced from a domain-specific Oil and Gas corpus, composed of thousands of scientific articles collected from the Elsevier online database. We explore an approach to represent different entities - such as articles, authors and keywords - in the same vector space, making it possible to correlate them and infer relations of similarity based on their cosine distance. An evaluation metric is also provided in order to assist the training process and hyperparameters optimization. Results, Observations, Conclusions Oil and Gas highly technical vocabulary represents a challenge to NLP applications, in which some terms may assume a completely different meaning from the general - context domain. In this scenario, gathering an Oil and Gas corpus and training specialized vector space models for this specific domain allows increasing the quality in Technology Intelligence Analysis. The most significant finding is that we were able to explicit the semantic relationships between different entities of interest in the same VSM, also linking these relationships together with some additional metadata. An interesting application is to compare the publications of authors affiliated to two or more O&G companies at a given time. These non-trivial correlations are important to gain strategic insights considering a Technology Intelligence Analysis scenario. Novel/Additive Information the novelty of this proposed methodology is the possibility of exploring new insights when correlating different entities in a technology intelligence scenario for the Oil and Gas domain, using a simple yet efficient approach based on document embedding techniques. This method applies some advanced NLP techniques to quickly process more than a hundred thousand documents in a few seconds, without requiring complex hardware resources, which would be impractical using traditional techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.