This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
The classification of seismo-volcanic signals is performed manually at La Soufrière Volcano, which is time consuming and can be biased by subjectivity of the operator. We propose here a machine-learning-based model for classification of these signals, to handle large datasets and provide objective and reproducible results. To describe the properties of the signals, we used 104 statistical, entropy, and shape descriptor features computed from the time waveform, the spectrum, and the cepstrum. First, we trained a random forest classifier with a dataset provided by the Observatoire Volcanologique et Sismologique de Guadeloupe that consisted of 845 labeled events that were recorded from 2013 to 2018: 542 volcano-tectonic (VT); 217 Nested; and 86 long period (LP). We obtained an overalll accuracy of 72%. We determined that the VT class includes a variety of signals that cover the VT, Nested and LP classes. After visual inspection of the waveforms and spectral characteristics of the dataset, we introduced two new classes: Hybrid and Tornillo. A new random forest classifier was trained with this new information, and we obtained a much better overall accuracy of 82%. The model is very good for recognition of all event classes, except Hybrid events (67% accuracy, 70% precision). Hybrid events are often considered to be a mix of VT and LP events. This can be explained by the nature of this class and the physical processes that include both fracturing and resonating components with different modal frequencies. By analyzing the feature weights and by training a model with the most important features, we show that a subset of the 14 best features is sufficient to obtain a performance that is close to that of the model with the whole feature set. However, these best features are different from the 13 best features obtained for another volcano in Peru, with only one feature common to both sets of best features. Therefore, the model is not universal and it must be trained for each volcano, or it is too specific to the one station used here.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.