Analyzing large amounts of continuous real-time data along with sustaining its reliability is a challenging task for engineers. Low reliability of data can prompt off base data examination results and make it difficult to make critical decisions in the oil and gas business, especially for the cases where the processing of data is continuous real-time data. One example is the continuous real-time data coming from Intelligent Field equipment. Intelligent field equipment for upstream can be a combination of Wellhead sensors, ESP, PDHMS, MPFM, SWC, SPFM, MOVs and H2S sensors. Intelligent Field equipment data follows specific transmission nodes as it flows from each sensor or instrument to a Remote Terminal Unit (RTU), to the Supervisory Control and Data Acquisition (SCADA) system, to a Plant Information (PI) data historian, and after appropriate filtration into the Exploration and Production (E&P) Corporate Database (Oracle), and finally into Petroleum Engineering developed applications. The unreliability of data can occur at any transmission nodes along this path. To discover the root cause of the unreliability, the Engineer must go through an exhaustive and lengthy process. Therefore, this paper introduces a promising methodology that tackles this challenge, by utilizing machine learning methods, to develop pattern recognition algorithms that recognize the transmission nodes where the unreliability of data appeared. Therefore, this paper aims to tackle two main challenges:Estimating data reliabilityDetecting the error location of unreliable data
By utilizing Machine Learning methods to develop pattern recognition algorithms that recognizes the transmission nodes where the unreliability of data appeared. Two out of three machine learning algorithms were selected; Decision Tree and Gradient Boosting (GB). Decision Tree showed an accuracy of 94.6%, while Gradient Boosting showed higher accuracy of 96.4% when estimating data reliability and error location determination. The robustness of GB over decision tree is that GB construct and train trees all at once while the latter train each tree independently. Training all trees at once helps to learn faster and reduce the error dramatically.