One of the main limitations in the field of audio signal processing is the lack of large public datasets with audio representations and high-quality annotations due to restrictions of copyrighted commercial music. We present Melon Playlist Dataset, a public dataset of mel-spectrograms for 649,091 tracks and 148,826 associated playlists annotated by 30,652 different tags. All the data is gathered from Melon, a popular Korean streaming service. The dataset is suitable for music information retrieval tasks, in particular, auto-tagging and automatic playlist continuation. Even though the latter can be addressed by collaborative filtering approaches, audio provides opportunities for research on track suggestions and building systems resistant to the cold-start problem, for which we provide a baseline. Moreover, the playlists and the annotations included in the Melon Playlist Dataset make it suitable for metric learning and representation learning.
This article describes an effective human face recognition algorithm. Even though the principle component analysis (PCA) is one of the most common feature extraction methods, it is not suitable to implement a real-time embedded system for face recognition because large amount of computational load and memory capacity are necessary. To overcome this problem, we employ the incremental two-directional two-dimensional PCA (I(2D) 2 PCA) which is a combination of the (2D) 2 PCA to demand much less computational complexity than the conventional PCA and the incremental PCA (IPCA) to adapt the eigenspace only by using a new incoming sample datum without reusing of all the previous trained data. Furthermore, the modified census transform (MCT), a local normalization method useful for real-world application and implementation in an embedded system, is adopted to address robustness to illumination variations. To achieve better recognition accuracy with less computational load, the processed features are classified by the compressive sensing approach using ' 2 -minimization. Experimental results on the Yale Face Database B show that the described system using the ' 2 -minimization-based classification method for input data processed by the I(2D) 2 PCA and the MCT provided efficient and robust face recognition.
A speech enhancement method is presented, which applies a soft mask to a target speech output of spatial filtering, such as conventional beamforming or independent component analysis (ICA). In contrast to conventional methods using either outputs or filters estimated by spatial filtering, the mask is constructed by exploiting both local output signal-to-noise ratio (SNR) and spatial selectivity obtained from the directivity pattern of the estimated filters. Experiments were conducted for both ICA and minimum power distortionless response beamforming as spatial filtering in order to demonstrate that the described mask estimation is not a tuned method for particular preprocessing. The results in terms of both SNR with a retained speech ratio and word accuracy in speech recognition show that the described method can effectively suppress residual noise in the target speech output of spatial filtering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.