Oil and gas industry have evolved towards digitalization and data are fully utilized for decision making, cost optimization, improve in efficiency, and increase productivity. Upstream sector in oil & gas produce a huge number of operation and production data in a real-time platform. It is tedious process that somehow impractical and inefficient to quality check and analyze all available data manually (Subrahmanya et al., 2014). By using machine learning algorithm, this can be improved to automate data quality check at scale. On top of that, imputation can also be implemented to substitute on missing data and future forecast in real-time.
In a case of this study, a huge data was collected from more than 30,000 tags/sensors in real-time. The real-time data were collected up to seconds and quality check need to be done up to each data collected. Firstly, each equipment tags/sensors had been checked and arranged with P&ID drawing. Then, API was developed with the real-time platform. In this project, percentile of machine learning was applied and developed to quality checked the operation and production time-series data at scale. Lastly, the process was customized to other offshore platforms in the field. In addition to automated data quality checking, machine learning algorithms were also used to calculate missing information based on the underlying relationship between data points. These approaches would reduce time needed to maintain quality and reliable data for further analysis and usage.
As a result, percentile in machine learning successfully automate the process of data quality check for more productivity and efficiency. The percentile was applied to understand, validate, and monitor data at scale. Anomalies were detected in real time that allows operators to analyze further on any possibility in faulty, damage, or loss. All the outliers, missing or wrong data were also recorded and visualized in a dashboard. The model also provides additional statistic to define stale and bad data on top of automated define parameters. These features have improved efficiency of data acquisition and preparation.
As conclusion, the model assists operator in monitoring daily operation and production data efficiently. Data quality and reliability is the key factor in asset management to ensure operator trust on produced data. The quality checked data could be utilized for further analysis, troubleshooting, and decision making.