Pipelines are critical arteries for the oil and gas industry, requiring massive capital investment to safely construct networks transporting hydrocarbons across diverse environments. However, these pipeline systems are prone to integrity failure, resulting in significant economic losses and environmental damage. The accurate prediction of pipeline failure events using historical oil pipeline accident data enables asset managers to plan sufficient maintenance, rehabilitation, and repair (MR&R) activities to prevent catastrophic failures. However, learning the complex interdependencies between pipeline attributes and rare failure events poses several analytical challenges. This study proposes a novel machine learning framework to accurately predict pipeline failure causes on highly class-imbalanced data compiled by the United States Pipeline and Hazardous Materials Safety Administration (PHMSA). Natural language processing techniques were leveraged to extract informative features from unstructured text data. Furthermore, class imbalance in the dataset was addressed via oversampling and intrinsic cost-sensitive learning strategies adapted for the multi-class case. Nine shallow and deep machine and deep learning architectures were benchmarked, with LightGBM demonstrating superior performance. The integration of cost-sensitive learning yielded an 86% F1 score and a 0.82 Cohen kappa score, significantly advancing prior research efforts. This study leveraged comprehensive Shapley Additive explanation (SHAP) analysis to interpret the predictions from the LightGBM algorithm, revealing the key factors driving failure probabilities. Leveraging sentiment analysis allowed the models to capture a richer, more multifaceted representation of the textual data. The paper developed a novel cost-sensitive learning approach that integrates domain knowledge about the varying cost impacts of misclassifying different failure types into machine learning models. The research demonstrated an effective fusion of text insights from inspection reports with structured pipeline data that enhances model interpretability. The resulting AI modeling framework generated data-driven predictions of failure causes, empowering transportation agencies with actionable insights. These insights enable tailored preventative maintenance decisions that proactively mitigate emerging pipeline failures.