Existing feature-based neural approaches for aspect-based sentiment analysis (ABSA) try to improve their performance with pre-trained word embeddings and by modeling the relations between the text sequence and the aspect (or category), thus heavily depending on the quality of word embeddings and task-specific architectures. Although the recently pre-trained language models, i.e., BERT and XLNet, have achieved state-of-the-art performance in a variety of natural language processing (NLP) tasks, they still subject to the aspect-specific, local feature-aware and task-agnostic challenges. To address these challenges, this paper proposes a XLNet and capsule network based model XLNetCN for ABSA. XLNetCN firstly constructs auxiliary sentence to model the sequence-aspect relation and generate global aspect-specific representations, which enables to enhance aspect-awareness and ensure the full pre-training of XLNet for improving task-agnostic capability. After that, XLNetCN also employs a capsule network with the dynamic routing algorithm to extract the local and spatial hierarchical relations of the text sequence, and yield its local feature representations, which are then merged with the global aspect-related representations for downstream classification via a softmax classifier. Experimental results show that XLNetCN outperforms significantly than the classical BERT, XLNet and traditional feature-based approaches on the two benchmark datasets of SemEval 2014, Laptop and Restaurant, and achieves new state-of-the-art results. INDEX TERMS Aspect-based sentiment analysis, natural language processing, text analysis, deep learning, neural network.