Drilling rigs are an expensive resource in the oil and gas industry; hence, planning their time properly is necessary. Estimating activity durations plays a crucial role in the rigs' planning. The selection of correlated wells to be used in estimation models is vital for good duration estimative. Based on this necessity, we present a regression supervised method to cluster wells. Results were compared with traditional unsupervised clustering methods.
The developed method transforms a machine learning unsupervised problem (without a target) into a machine learning regression supervised problem (with a target). We use a decision tree regression to transform an unsupervised clustering problem into a supervised clustering problem and compare it with the other two methods based on K-means (unsupervised). We apply multiple linear regression with leave-one-out cross-validation in the resulting clusters to evaluate the clusters.
We tested the methodology in eight different scenarios, varying in number of wells (42 to 236) and similarities. All scenarios were based on real wells from a large Brazilian energy company. The target for the decision tree regression methodology was the drilling duration, and the variables were water depth and well footage. We limited the minimum number of wells per cluster, reducing overfitting in the estimation technique. Our methodology reached the lowest mean absolute percentage error value for all scenarios. The decision tree regression methodology can increase the capability of the expert to select similar wells based on their characteristics and duration.
The novelty of the decision tree regression method applied in a clustering problem is to provide the experts with a tool that considers the operational duration, besides the traditional set of relevant variables, transforming an unsupervised problem into a regression supervised problem. With the decision tree regression method, it is possible to cluster wells in more similar groups, improving any further analysis.