Abstract-Identifying musical instruments in polyphonic music recordings is a challenging but important problem in the field of music information retrieval. It enables music search by instrument, helps recognize musical genres, or can make music transcription easier and more accurate. In this paper, we present a convolutional neural network framework for predominant instrument recognition in real-world polyphonic music. We train our network from fixed-length music excerpts with a single-labeled predominant instrument and estimate an arbitrary number of predominant instruments from an audio signal with a variable length. To obtain the audio-excerpt-wise result, we aggregate multiple outputs from sliding windows over the test audio. In doing so, we investigated two different aggregation methods: one takes the average for each instrument and the other takes the instrument-wise sum followed by normalization. In addition, we conducted extensive experiments on several important factors that affect the performance, including analysis window size, identification threshold, and activation functions for neural networks to find the optimal set of parameters. Using a dataset of 10k audio excerpts from 11 instruments for evaluation, we found that convolutional neural networks are more robust than conventional methods that exploit spectral features and source separation with support vector machines. Experimental results showed that the proposed convolutional network architecture obtained an F1 measure of 0.602 for micro and 0.503 for macro, respectively, achieving 19.6% and 16.4% in performance improvement compared with other state-of-the-art algorithms.
PurposeBreathing sounds during sleep are altered and characterized by various acoustic specificities in patients with sleep disordered breathing (SDB). This study aimed to identify acoustic biomarkers indicative of the severity of SDB by analyzing the breathing sounds collected from a large number of subjects during entire overnight sleep.MethodsThe participants were patients who presented at a sleep center with snoring or cessation of breathing during sleep. They were subjected to full-night polysomnography (PSG) during which the breathing sound was recorded using a microphone. Then, audio features were extracted and a group of features differing significantly between different SDB severity groups was selected as a potential acoustic biomarker. To assess the validity of the acoustic biomarker, classification tasks were performed using several machine learning techniques. Based on the apnea–hypopnea index of the subjects, four-group classification and binary classification were performed.ResultsUsing tenfold cross validation, we achieved an accuracy of 88.3% in the four-group classification and an accuracy of 92.5% in the binary classification. Experimental evaluation demonstrated that the models trained on the proposed acoustic biomarkers can be used to estimate the severity of SDB.ConclusionsAcoustic biomarkers may be useful to accurately predict the severity of SDB based on the patient’s breathing sounds during sleep, without conducting attended full-night PSG. This study implies that any device with a microphone, such as a smartphone, could be potentially utilized outside specialized facilities as a screening tool for detecting SDB.
Abstract-We describe an acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results. We avoid the extremely laborious task of human annotation of chord names and boundaries-which must be done to provide machine learning models with ground truth-by performing automatic harmony analysis on symbolic music files. In parallel, we synthesize audio from the same symbolic files and extract acoustic feature vectors which are in perfect alignment with the labels. We, therefore, generate a large set of labeled training data with a minimal amount of human labor. This allows for richer models. Thus, we build 24 key-dependent HMMs, one for each key, using the key information derived from symbolic data. Each key model defines a unique state-transition characteristic and helps avoid confusions seen in the observation vector. Given acoustic input, we identify a musical key by choosing a key model with the maximum likelihood, and we obtain the chord sequence from the optimal state path of the corresponding key model, both of which are returned by a Viterbi decoder. This not only increases the chord recognition accuracy, but also gives key information. Experimental results show the models trained on synthesized data perform very well on real recordings, even though the labels automatically generated from symbolic data are not 100% accurate. We also demonstrate the robustness of the tonal centroid feature, which outperforms the conventional chroma feature.Index Terms-Acoustic chord transcription, hidden Markov model (HMM), key-dependent models, key extraction, symbolic music files.
A dominant sigmoid sinus with either diverticulum or dehiscence (SS-Div/SS-Deh) is a common cause of pulsatile tinnitus (PT). For PT originating from SS-Div/SS-Deh, an etiology-specific and secure reconstruction using firm materials is vital for optimal outcomes. As a follow-up to our previous reports on transmastoid SS resurfacing or reshaping for SS-Div/SS-Deh, this study aimed to evaluate the long-term results of transmastoid resurfacing/reshaping. We retrospectively reviewed 20 PT patients who were diagnosed with SS-Div/SS-Deh, underwent transmastoid resurfacing/reshaping, and were followed up for more than 1 year postoperatively. For PT, immediate and long-term changes (> 1 year) in loudness and annoyance were analyzed using the visual analog scale (VAS). Additionally, pre and postoperative objective measurements of PT using transcanal sound recording and spectro-temporal analysis (TSR-STA), imaging results, and audiological findings were comprehensively analyzed. Significant improvements in PT were sustained or enhanced for > 1 year (median follow-up period: 37 months, range: 12–54 months). On TSR-STA, both peak and root mean square amplitudes decreased after surgery. Also, the average pure-tone threshold at 250 Hz improved after surgery. Thus, our long-term follow-up data confirmed that the surgical management of PT originating from SS-Div/SS-Deh is successful with regard to both objective and subjective measures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.