In the last few years there is an increasing interest in the industry to apply Machine Learning (ML) algorithms to improve business decisions and operational efficiencies. The driver behind are the 3V's (velocity, variety and volume) of data acquisition and synthesis. The enormity of making sense out of this data pile is either too cumbersome for direct human interpretability or insurmountably time consuming (and often impractical) for physics-based models. The Machine Learning techniques systematically unravel the underlying trends and interrelationships between the driver and response variables. However, the application of these data science techniques are still relatively new in the petroleum industry and needs careful selection and adaptability to improve their forecasting success. This paper contributes in applying some of these techniques, especially deep and shallow learning algorithms, in a systematic manner, traversing step-by-step methodology of data preparation, exploratory data analysis, model selection, model validation, model parameter tuning, selection of variable of importance and model application. In particular, data sets are prepared for both Supervised Regression (continuous) and Classification (categorical) methods. Post exploratory data analysis, multivariate regression along with Multicollinearity/Variation Inflation Factor and outlier tests are applied to reduce the predictor variable list. Thereafter, classification models e.g. Gradient Boosting, Support Vector Machine, k-Nearest neighbors, Decision Trees, Random Forest etc. are progressively disciplined on training data sets to be tested on the hold-out data sets. Accuracy of predictability is compared against standard goodness-of-fit metrics. Finally, stratified k-fold cross validation methodology is applied to tune model parameters and list variables of importance.
First the Shallow and Deep Learning process flow is applied to a large Delaware basin data set comprising of 5716 horizontal wells scattered in the various members of Wolfcamp formations. The original database contains a total of 131 predictor variables containing 26 reservoir, 21 completion, 22 well architecture. 53 production and 9 reservoir fluid related. The dataset is mined for individual Wolfcamp members. Results are provided to demonstrate model's predictive accuracy, applicability to a new dataset and potential pitfalls in forecasting if certain statistical metrics are ignored. The important variables of interest (in the statistically reduced dataset) which are assigned more weights in the predictive process are also enlisted. Next, as a second case study, a different Deep Learning methods (Long Short Term Memory, LSTM) is applied to history match and forecast an Eagle Ford well decline curve, to demonstrate the viability of this method in forecasting production.
The paper contributes towards better understanding of some of the ubiquitous black-box ML algorithms, define an appropriate process flow to analyze large datasets and help petroleum engineers and geoscientists to apply them more rigorously and robustly in their own applications.