The applications of artificial intelligence (AI) and natural language processing (NLP) have significantly empowered the safety and operational efficiency within the aviation sector for safer and more efficient operations. Airlines derive informed decisions to enhance operational efficiency and strategic planning through extensive contextual analysis of customer reviews and feedback from social media, such as Twitter and Facebook. However, this form of analytical endeavor is labor-intensive and time-consuming. Extensive studies have investigated NLP algorithms for sentiment analysis based on textual customer feedback, thereby underscoring the necessity for an in-depth investigation of transformer architecture-based NLP models. In this study, we conducted an exploration of the large language model BERT and three of its derivatives using an airline sentiment tweet dataset for downstream tasks. We further honed this fine-tuning by adjusting the hyperparameters, thus improving the model’s consistency and precision of outcomes. With RoBERTa distinctly emerging as the most precise and overall effective model in both the binary (96.97%) and tri-class (86.89%) sentiment classification tasks and persisting in outperforming others in the balanced dataset for tri-class sentiment classification, our results validate the BERT models’ application in analyzing airline industry customer sentiment. In addition, this study identifies the scope for improvement in future studies, such as investigating more systematic and balanced datasets, applying other large language models, and using novel fine-tuning approaches. Our study serves as a pivotal benchmark for future exploration in customer sentiment analysis, with implications that extend from the airline industry to broader transportation sectors, where customer feedback plays a crucial role.