Recently, the COVID-19 pandemic coronavirus has put a lot of pressure on health systems around the world. One of the most common ways to detect COVID-19 is to use chest X-ray images, which have the advantage of being cheap and fast. However, in the early days of the COVID-19 outbreak, most studies applied pretrained convolutional neural network (CNN) models, and the features produced by the last convolutional layer were directly passed into the classification head. In this study, the proposed ensemble model consists of three lightweight networks, Xception, MobileNetV2 and NasNetMobile as three original feature extractors, and then three base classifiers are obtained by adding the coordinated attention module, LSTM and a new classification head to the original feature extractors. The classification results from the three base classifiers are then fused by a confidence fusion method. Three publicly available chest X-ray datasets for COVID-19 testing were considered, with ternary (COVID-19, normal and other pneumonia) and quaternary (COVID-19, normal) analyses performed on the first two datasets, bacterial pneumonia and viral pneumonia classification, and achieved high accuracy rates of 95.56% and 91.20%, respectively. The third dataset was used to compare the performance of the model compared to other models and the generalization ability on different datasets. We performed a thorough ablation study on the first dataset to understand the impact of each proposed component. Finally, we also performed visualizations. These saliency maps not only explain key prediction decisions of the model, but also help radiologists locate areas of infection. Through extensive experiments, it was finally found that the results obtained by the proposed method are comparable to the state-of-the-art methods.