Acoustic monitoring is an effective and scalable way to assess the health of important bioindicators like bats in the wild. However, the large amounts of resulting noisy data requires accurate tools for automatically determining the presence of different species of interest. Machine learning-based solutions offer the potential to reliably perform this task, but can require expertise in order to train and deploy. We propose BatDetect2, a novel deep learning-based pipeline for jointly detecting and classifying bat species from acoustic data. Distinct from existing deep learning-based acoustic methods, BatDetect2's outputs are interpretable as they directly indicate at what time and frequency a predicted echolocation call occurs. BatDetect2 also makes use of surrounding temporal information in order to improve its predictions, while still remaining computationally efficient at deployment time. We present experiments on five challenging datasets, from four distinct geographical regions (UK, Mexico, Australia, and Brazil). BatDetect2 results in a mean average precision of 0.88 for a dataset containing 17 bat species from the UK. This is significantly better than the 0.71 obtained by a traditional call parameter extraction baseline method. We show that the same pipeline, without any modifications, can be applied to acoustic data from different regions with different species compositions. The data annotation, model training, and evaluation tools proposed will enable practitioners to easily develop and deploy their own models. BatDetect2 lowers the barrier to entry preventing researchers from availing of effective deep learning bat acoustic classifiers.