The current oil and gas market is characterized by low prices, high uncertainties and a subsequent reduction in new investments. This leads to an ever-increasing attention towards more efficient asset management. The fouling effect is considered one of the main problems drastically affecting asset integrity/efficiency and heat exchanger performances of critical machineries in upstream production plants. This paper illustrates the application of advanced big data analytics and innovative machine learning techniques to face this challenge. The optimal maintenance scheduling and the early identification of workflow-blocking events strongly impact the overall production, as they heavily contribute to the reduction of down-times. While, machine learning techniques proved to introduce significant advantages to these problems, they are fundamentally data-driven. In industry scenarios, where dealing with a limited amount of data is standard practice, this means forcing the use of simpler models that are often not able to disentangle the real dynamics of the phenomenon. The lack of data is generally caused by frequent changes in operating conditions/field layout or an insufficient instrumentation system. Moreover, the intrinsic long duration of many physical phenomena and the ordinary asset maintenance lifecycle, cause a critical reduction in the number of relevant events that can be learned. In this work, the fouling problem has been explored leveraging only limited data. The attention is focused on two different equipment: heat exchangers and re-boilers. While the formers involve slower dynamics, the latter are characterized by a steady phase followed by an abrupt deterioration. Moreover, the first ones allow a proper scheduling of cleaning interventions in advance. On the other hand, the second forces a much quicker plant stop. Finally, heat exchangers are characterized by few episodes of comparable deterioration, while re-boilers present only a single episode. Regarding heat exchangers, a dual approach has been followed, merging a short-term, time-series-based model, and a long-term one based on linear regression. After having isolated a number of training regions related to the fouling episodes that showed a characteristic behavior, it is possible to obtain accurate results in the short-term and to capture the general trend in the long-term. In the case of re-boilers, a novelty detection approach has been adopted: first, the model learns the equipment normal behavior, then it uses the features learned to detect anomalies. This continuous training-predicting iteration also leverages the user feedback to adapt to new operating conditions. Results show that in an "young digital" industry, the use of limited data together with simpler machine learning techniques, can successfully become an automatic diagnostics tool supporting the operators to improve traditional maintenance activities as well as optimize the production rate, and finally the asset efficiency
In Oil and Gas, technical staff is daily involved in critical activities. Safety is therefore a key priority, even more so with frontier and continuously-updating technologies acting as a fundamental part of the transformation of the traditional industrial processes. While safety reports and investigations have long been adequately stored and continuously monitored by expert professionals, Artificial Intelligence applications to natural language now provide the opportunity to develop a decision support system capable of extracting insights, predicting the risk of future operations, performing scenario analysis and prescribing risk mitigation actions on massive amounts of data. In this work, we used an Open Innovation approach to develop a Safety Pre-Sense system, leveraging Machine Learning and Natural Language Processing techniques as well as incorporating multiple different (and often unexpected) sources of information. Starting from standard Natural Language Processing tasks, we leverage linguistic patterns to build binary Document-Term Matrices. Operating on these Matrices, we implemented a Domain Keyword Extraction algorithm to extract words (or multi-words) that have high specificity. Our pipeline also provides a language-agnostic method to detect similarities between documents written in different languages and cluster them accordingly, in order to obtain clear descriptors that can be used to understand their meaning. To do so, we map our text in a high-dimension vector space where we apply cluster analysis to group documents that are semantically close into consistent and multilingual groups. We then extract, for each language, a list of domain keywords that characterize every cluster. Next, we identify similarities in the data in a completely data-driven manner, with the objective of extracting correlations between event features (such as geographical location and cause or type of event). As a result, we extract new aggregations of complex items such as severe Accidents or Work Processes. We also demonstrate how Correspondence Analysis and Pattern Mining algorithms are able to extract and visualize correlations between topics and events, leveraging a dynamic Qlik dashboard. Finally, we point at additional sources of information, both internal and external to our company, that can be used to enhance our analysis.
The use of advanced analytics techniques has become pivotal for the Digital Transformation of the Oil and Gas Industry. Most of these models are used to predict and avoid the off-spec behaviors of both equipment and functional units of the plant and also for predicting overshooting events in advance allows plant’s operators to avoid production down-time. From a Machine Learning perspective, predicting off-specs situation and peaks in time signal is a complex task, due to the great rarity of events. For the very same reason, using standard data science measures – like Area Under the Curve (AUC), Recall and Precision – can lead to misleading performance indicators. In fact, a model that predicts no off-spec would have a high AUC just because of the unbalanced classes, leading to many false alarms. In this paper we present a business-oriented validation framework for big data analytics and machine learning models applied to a upstream production plant. This allow to evaluate both the effort required to operators and the expected benefit that could be achieved. The validation metrics defined take the classical Data Science measures to the business domain. This allow to adapt the model to the very specific use case and end user addressing the specific upstream plants constraints. This framework allows to define the optimal tradeoff between effort required and preventable events, providing statistics and KPIs to evaluate it. Normalized Recall (NR) takes into account both the percentage of events intercepted and the effort required, in terms of Attention Time (AT), when the operator should pay attention to the equipment involved. Plant operators can now have an idea of the results they can achieve with respect to the maximum effort required. Moreover, to prove the goodness of the model, we defined the lift in the NR as the ratio of the model NR and the NR that would be obtained by randomly distributing the same number of alarms. We applied this framework to specific use cases obtaining an expected recall of 40-50% with an expected effort of 5-10% of the time (considering more than 6 months). The effort is actually lower, since the operator is not requested to be fully committed to the alarm. The innovative framework developed is able to demonstrate the real operating capability of the analytics implemented on field, highlighting both the effort required to operators and the accuracy of machine learning tools.
This paper highlights results of a first campaign of tests of an innovative tool to predict the short term trend of the energy efficiency index and the optimal management of an Oil&Gas production plant. The developed tool represents a step towards the Digital Transformation of the production plants by the integration of Big Data Analytics and Machine Learning methodologies with experts’ domain knowledge. The predictive model has two main features: produces a forecast of the energy efficiency index 3 hours into the future and supports the site engineers to take optimal management choices by highlighting the energy performances of the most energy-intensive equipment of the plant. The operator can use this information to act on the plant and reduce the overall energy consumption. The energy efficiency index is described by the Stationary Combustion CO2 Emission KPI [tCO2/kboe] that relates the consumed energy and the associated CO2 emission with the total production. A Gradient Boosting regression model is implemented and fed with a mix of autoregressive and exogenous real-time parameters. The paper shows the results obtained through a series of actions taken by production engineer, using the model's outputs on a real operating oilfield. The methodology entails a real-time run of the model to analyze the Stationary Combustion CO2 Emission Index trend, identifying a subset of equipment with abnormal/atypical consumption. This activity is followed by a monitoring phase, where such subset of equipment is further analyzed in terms of energy consumption and main process parameter. The purpose of this phase is to identify a relevant set of actions. The final step requires interaction with the control room to act on the equipment operative parameters. Preliminary tests show the daily average CO2 emission from stationary combustion were reduced by 0,9%, with a peak reduction of 1,35%. The main advantages experimented by the first implementation tests are related to the significant reduction of CO2 emissions while granting the highest level of production, allowing a step towards the field carbon neutrality target.
As Industrial plants evolve towards massive digitalization and their "digital twin" architectures are constantly enriched with a wide range of advanced analytics solutions that are becoming part of the ordinary control operations, the quality of the input data becomes of paramount importance. But as the number of variables, KPIs and processes monitored in plants raise of several orders of magnitude, automated procedures are needed for an effective monitoring. We have developed a novelty detection framework with the objective of monitoring each and every variable in the plant, detecting anomalies in real time and therefore allowing the operators to investigate more in depth on the possible causes before any damage is made. In order to do so, we had to define the "normality" of a signal, which can heavily vary from one to the other. Therefore, we setup a learning procedure that integrates several steps. First, we need to distinguish the normal periods from the anomalous ones. While there are failures that are known to have impacted at certain times certain sections of the plant, three will definitely be more anomalies that have remained hidden so far. We hence labelled each timestamp of the series using an isolation forest algorithm. Using the obtained normal dataset, we then extracted for each sensor the features that we have identified to better characterize a time series in its normal operating conditions. First, we select the signals in the plant that are the most correlated with the one at hand and fit a Ridge Regression and estimate the residual distribution. Then, we extract statistics such as mean, standard deviation, mean length and frequency of frozen periods, outliers, NaNs, Fast Fourier Transform and Welch’s approach for spectral density estimation. Finally, we heuristically define a number of tests capable of distinguishing a normal from an anomalous time window. With this procedure, we are able to detect in real time whenever a signal is behaving differently from the way it is expected to. Depending on the operator’s experience, it would then be possible to understand whether it is a sensor malfunction or whether it indicates that something is wrong with the physical phenomenon, determining different actions, such as maintenance rescheduling. We have therefore conceived a dashboard that allows each operator to input its feedback, producing a refined dataset ready for continuous retraining. Ultimately, this anomaly detection framework will be used also to filter out the inputs to many advanced analytics solutions, guaranteeing the quality of their results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.