In Oil and Gas, technical staff is daily involved in critical activities. Safety is therefore a key priority, even more so with frontier and continuously-updating technologies acting as a fundamental part of the transformation of the traditional industrial processes. While safety reports and investigations have long been adequately stored and continuously monitored by expert professionals, Artificial Intelligence applications to natural language now provide the opportunity to develop a decision support system capable of extracting insights, predicting the risk of future operations, performing scenario analysis and prescribing risk mitigation actions on massive amounts of data. In this work, we used an Open Innovation approach to develop a Safety Pre-Sense system, leveraging Machine Learning and Natural Language Processing techniques as well as incorporating multiple different (and often unexpected) sources of information. Starting from standard Natural Language Processing tasks, we leverage linguistic patterns to build binary Document-Term Matrices. Operating on these Matrices, we implemented a Domain Keyword Extraction algorithm to extract words (or multi-words) that have high specificity. Our pipeline also provides a language-agnostic method to detect similarities between documents written in different languages and cluster them accordingly, in order to obtain clear descriptors that can be used to understand their meaning. To do so, we map our text in a high-dimension vector space where we apply cluster analysis to group documents that are semantically close into consistent and multilingual groups. We then extract, for each language, a list of domain keywords that characterize every cluster. Next, we identify similarities in the data in a completely data-driven manner, with the objective of extracting correlations between event features (such as geographical location and cause or type of event). As a result, we extract new aggregations of complex items such as severe Accidents or Work Processes. We also demonstrate how Correspondence Analysis and Pattern Mining algorithms are able to extract and visualize correlations between topics and events, leveraging a dynamic Qlik dashboard. Finally, we point at additional sources of information, both internal and external to our company, that can be used to enhance our analysis.
Processing case-law contents for electronic publishing purposes is a time-consuming activity that encompasses several sub-tasks and usually involves adding annotations to the original text. On the other hand, recent trends in Artificial Intelligence and Natural Language Processing enable the automatic and efficient analysis of big textual data. In this paper we present our Machine Learning solution to three specific business problems, regularly met by a real world Italian publisher in their day-to-day work: recognition of legal references in text spans, new content ranking by relevance, and text classification according to a given tree of topics. Different approaches based on BERT language model were experimented with, together with alternatives, typically based on Bag-of-Words. The optimal solution, deployed in a controlled production environment, was in two out of three cases based on fine-tuned BERT (for the extraction of legal references and text classification), while, in the case of relevance ranking, a Random Forest model, with hand-crafted features, was preferred. We will conclude by discussing the concrete impact, as perceived by the publisher, of the developed prototypes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.