Singer Identification Using Deep Timbre Feature Learning with KNN-NET

Zhang, Xulong; Qian, Jiale; Yu, Yi; Sun, Yun-Yu; Li, Wei

doi:10.1109/icassp39728.2021.9413774

Cited by 18 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Singing voice separation (SVS) has drawn a lot of interest and consideration in many downstream applications [ 1 , 2 , 3 , 4 ]. It deals with the technique of separating a singing voice or background from a mix of music, which is a crucial strategy for singer identification [ 5 , 6 ], music information retrieval [ 7 , 8 ], lyric recognition and alignment [ 9 , 10 , 11 , 12 ], song language identification [ 13 , 14 ], and chord recognition [ 15 , 16 , 17 ]. The recent separation techniques, however, fall well short of the capabilities of human hearing.…”

Section: Introductionmentioning

confidence: 99%

Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection

Wang

2023

Sensors

View full text Add to dashboard Cite

Singing-voice separation is a separation task that involves a singing voice and musical accompaniment. In this paper, we propose a novel, unsupervised methodology for extracting a singing voice from the background in a musical mixture. This method is a modification of robust principal component analysis (RPCA) that separates a singing voice by using weighting based on gammatone filterbank and vocal activity detection. Although RPCA is a helpful method for separating voices from the music mixture, it fails when one single value, such as drums, is much larger than others (e.g., the accompanying instruments). As a result, the proposed approach takes advantage of varying values between low-rank (background) and sparse matrices (singing voice). Additionally, we propose an expanded RPCA on the cochleagram by utilizing coalescent masking on the gammatone. Finally, we utilize vocal activity detection to enhance the separation outcomes by eliminating the lingering music signal. Evaluation results reveal that the proposed approach provides superior separation outcomes than RPCA on ccMixter and DSD100 datasets.

show abstract

Section: Introductionmentioning

confidence: 99%

Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection

Wang

2023

Sensors

View full text Add to dashboard Cite

show abstract

“…Singer identification (SID) is an essential part of MIR, which purpose is to identify performing singers in a given audio sample [2], [3]. SID is used in music library management to address the classification of songs by singers.…”

Section: Introductionmentioning

confidence: 99%

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features

Zhang¹,

Wang²,

Cheng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Metaverse is an interactive world that combines reality and virtuality, where participants can be virtual avatars. Anyone can hold a concert in a virtual concert hall, and users can quickly identify the real singer behind the virtual idol through the singer identification. Most singer identification methods are processed using the frame-level features. However, expect the singer's timbre, the music frame includes music information, such as melodiousness, rhythm, and tonal. It means the music information is noise for using frame-level features to identify the singers. In this paper, instead of only the frame-level features, we propose to use another two features that address this problem. Middle-level feature, which represents the music's melodiousness, rhythmic stability, and tonal stability, and is able to capture the perceptual features of music. The timbre feature, which is used in speaker identification, represents the singers' voice features. Furthermore, we propose a convolutional recurrent neural network (CRNN) to combine three features for singer identification. The model firstly fuses the frame-level feature and timbre feature and then combines middle-level features to the mix features. In experiments, the proposed method achieves comparable performance on an average F1 score of 0.81 on the benchmark dataset of Artist20, which significantly improves related works.

show abstract

“…In addition to the feature representation, most research focus on the classifier. Different classifiers have been tried on singer identification, including SVM, GMM, HMM, and random forest [2], [15]- [17]. With the successful application of deep models in various tasks [5], [18], some studies are using deep models to improve performance on singer identification, such as CRNN [19] which is a state of the art method.…”

Section: Introductionmentioning

confidence: 99%

MetaSID: Singer Identification with Domain Adaptation for Metaverse

Zhang¹,

Wang²,

Cheng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Metaverse has stretched the real world into unlimited space. There will be more live concerts in Metaverse. The task of singer identification is to identify the song belongs to which singer. However, there has been a tough problem in singer identification, which is the different live effects. The studio version is different from the live version, the data distribution of the training set and the test set are different, and the performance of the classifier decreases. This paper proposes the use of the domain adaptation method to solve the live effect in singer identification. Three methods of domain adaptation combined with Convolutional Recurrent Neural Network (CRNN) are designed, which are Maximum Mean Discrepancy (MMD), gradient reversal (Revgrad), and Contrastive Adaptation Network (CAN). MMD is a distance-based method, which adds domain loss. Revgrad is based on the idea that learned features can represent different domain samples. CAN is based on class adaptation, it takes into account the correspondence between the categories of the source domain and target domain. Experimental results on the public dataset of Artist20 show that CRNN-MMD leads to an improvement over the baseline CRNN by 0.14. The CRNN-RevGrad outperforms the baseline by 0.21. The CRNN-CAN achieved state of the art with the F1 measure value of 0.83 on album split.

show abstract

Singer Identification Using Deep Timbre Feature Learning with KNN-NET

Cited by 18 publications

References 13 publications

Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection

Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features

MetaSID: Singer Identification with Domain Adaptation for Metaverse

Contact Info

Product

Resources

About