This work explores the effect of gender and linguistic-based vocal variations on the accuracy of emotive expression classification. Emotive expressions are considered from the perspective of spectral features in speech (Mel-frequency Cepstral Coefficient, Melspectrogram, Spectral Contrast). Emotions are considered from the perspective of Basic Emotion Theory. A convolutional neural network is utilised to classify emotive expressions in emotive audio datasets in English, German, and Italian. Vocal variations for spectral features assessed by (i) a comparative analysis identifying suitable spectral features, (ii) the classification performance for mono, multi and cross-lingual emotive data and (iii) an empirical evaluation of a machine learning model to assess the effects of gender and linguistic variation on classification accuracy. The results showed that spectral features provide a potential avenue for increasing emotive expression classification. Additionally, the accuracy of emotive expression classification was high within mono and cross-lingual emotive data, but poor in multi-lingual data. Similarly, there were differences in classification accuracy between gender populations. These results demonstrate the importance of accounting for population differences to enable accurate speech emotion recognition.
In biomedical image analysis, data imbalance is common across several imaging modalities. Data augmentation is one of the key solutions in addressing this limitation. Generative Adversarial Networks (GANs) are increasingly being relied upon for data augmentation tasks. Biomedical image features are sensitive to evaluating the efficacy of synthetic images. These features can have a significant impact on metric scores when evaluating synthetic images across different biomedical imaging modalities. Synthetically generated images can be evaluated by comparing the diversity and quality of real images. Multi-scale Structural Similarity Index Measure and Cosine Distance are used to evaluate intra-class diversity, while Frechet Inception Distance is used to evaluate the quality of synthetic images. Assessing these metrics for biomedical and nonbiomedical imaging is important to investigate an informed strategy in evaluating the diversity and quality of synthetic images. In this work, an empirical assessment of these metrics is conducted for the Deep Convolutional GAN in a biomedical and non-biomedical setting. The diversity and quality of synthetic images are evaluated using different sample sizes. This research intends to investigate the variance in diversity and quality across biomedical and non-biomedical imaging modalities. Results demonstrate that the metrics scores for diversity and quality vary significantly across biomedical-to-biomedical and biomedical-to-nonbiomedical imaging modalities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.