Experimental
approaches for determining asphaltene precipitation
in a laboratory are time-consuming and expensive due to consumption
of a large amount of solvents. Development of robust, reliable, fast,
and economic predictive tools to forecast the amount of asphaltene
precipitation for a wide range of pressures, temperatures, and operational
parameters and properties of petroleum fluids is inevitable. The main
objective of this research work was to develop machine learning models
using experimental data to predict asphaltene precipitation amount
due to titration. After collecting 1439 data samples from 27 experimental
research works, a quality check was performed for possible logical
filling of the missing values and detecting the problematic data samples.
Three categories, operational parameters, oil properties, and gas
properties, were recognized to be the most influential parameters.
The database used in this work is so far the largest ever reported
in the literature. In addition, pressure is considered as one of the
major parameters in this work, which was not considered in the previously
reported models (i.e., all were conducted under ambient pressure).
For the first time, 39 different oil samples were considered in the
modeling (i.e., the existing works are mostly for one oil sample).
We proposed new indices in the modeling to account for different oil
types and n-alkanes. Due to the pressure data distribution,
the database was split into two clusters. Each cluster went through
several statistical preprocessing stages including treating duplicates
and zero-variance features, imputing the missing data, assessing the
collinearity, feature selection, and data splitting and scaling. Then,
five different models, multilayer perceptron (MLP), support vector
machine (SVM), decision tree (DT), random forest (RF), and committee
machine intelligent system (CMIS), were used for model development.
Based on the acquired results, the RF was determined as the best predictor
for both clusters, consequently, for the whole database with root-mean-square
error (RMSE) and R
2 values of 0.94 and
0.97, respectively, for the testing data set. The developed models
can be used to accurately predict asphaltene precipitation by n-alkane titration for a wide range of pressure and crude
oil properties.