Volcanic tremor is a semi‐continuous seismic and/or acoustic signal that occurs at time scales ranging from seconds to years, with variable amplitudes and spectral features. Tremor sources have often been related to fluid movement and degassing processes, and are recognized as a potential geophysical precursor and co‐eruptive geophysical signal. Eruption forecasting and monitoring efforts need a fast, robust method to automatically detect, characterize, and catalog volcanic tremor. Here we develop VOlcano Infrasound and Seismic Spectrogram Network (VOISS‐Net), a pair of convolutional neural networks (one for seismic, one for acoustic) that can detect tremor in near real‐time and classify it according to its spectral signature. Specifically, we construct an extensive data set of labeled seismic and low‐frequency acoustic (infrasound) spectrograms from the 2021–2022 eruption of Pavlof Volcano, Alaska, and use it to train VOISS‐Net to differentiate between different tremor types, explosions, earthquakes and noise. We use VOISS‐Net to classify continuous data from past Pavlof Volcano eruptions (2007, 2013, 2014, 2016, and 2021–2022). VOISS‐Net achieves an 81.2% and 90.0% accuracy on the seismic and infrasound test sets respectively, and successfully characterizes tremor sequences for each eruption. By comparing the derived seismoacoustic timelines of each eruption with the corresponding eruption chronologies compiled by the Alaska Volcano Observatory, our model identifies changes in tremor regimes that coincide with observed volcanic activity. VOISS‐Net can aid tremor‐related monitoring and research by making consistent tremor catalogs more accessible.