Background The application of machine learning to cardiac auscultation has the potential to improve the accuracy and efficiency of both routine and point-of-care screenings. The use of convolutional neural networks (CNN) on heart sound spectrograms in particular has defined state-of-the-art performance. However, the relative paucity of patient data remains a significant barrier to creating models that can adapt to a wide range of potential variability. To that end, we examined a CNN model’s performance on automated heart sound classification, before and after various forms of data augmentation, and aimed to identify the most optimal augmentation methods for cardiac spectrogram analysis. Results We built a standard CNN model to classify cardiac sound recordings as either normal or abnormal. The baseline control model achieved a PR AUC of 0.763 ± 0.047. Among the single data augmentation techniques explored, horizontal flipping of the spectrogram image improved the model performance the most, with a PR AUC of 0.819 ± 0.044. Principal component analysis color augmentation (PCA) and perturbations of saturation-value (SV) of the hue-saturation-value (HSV) color scale achieved a PR AUC of 0.779 ± 045 and 0.784 ± 0.037, respectively. Time and frequency masking resulted in a PR AUC of 0.772 ± 0.050. Pitch shifting, time stretching and compressing, noise injection, vertical flipping, and applying random color filters negatively impacted model performance. Concatenating the best performing data augmentation technique (horizontal flip) with PCA and SV perturbations improved model performance. Conclusion Data augmentation can improve classification accuracy by expanding and diversifying the dataset, which protects against overfitting to random variance. However, data augmentation is necessarily domain specific. For example, methods like noise injection have found success in other areas of automated sound classification, but in the context of cardiac sound analysis, noise injection can mimic the presence of murmurs and worsen model performance. Thus, care should be taken to ensure clinically appropriate forms of data augmentation to avoid negatively impacting model performance.
Background: The application of machine learning to cardiac auscultation has the potential to improve the accuracy and efficiency of both routine and point-of-care screenings. The use of Convolutional Neural Networks (CNN) on heart sound spectrograms in particular has defined state-of-the-art performance. However, the relative paucity of patient data remains a significant barrier to creating models that can adapt to the wide range of between-subject variability. To that end, we examined a CNN model’s performance on automated heart sound classification, before and after various forms of data augmentation, and aimed to identify the most optimal augmentation methods for cardiac spectrogram analysis.Results: We built a standard CNN model to classify cardiac sound recordings as either normal or abnormal. The baseline control model achieved an ROC AUC of 0.945±0.016. Among the data augmentation techniques explored, horizontal flipping of the spectrogram image improved the model performance the most, with an ROC AUC of 0.957±0.009. Principal component analysis color augmentation (PCA) and perturbations of saturation-value (SV) of the hue-saturation-value (HSV) color scale achieved an ROC AUC of 0.949±0.014 and 0.946±0.019, respectively. Time and frequency masking resulted in an ROC AUC of 0.948±0.012. Pitch shifting, time stretching and compressing, noise injection, vertical flipping, and applying random color filters all negatively impacted model performance.Conclusion: Data augmentation can improve classification accuracy by expanding and diversifying the dataset, which protects against overfitting to random variance. However, data augmentation is necessarily domain specific. For example, methods like noise injection have found success in other areas of automated sound classification, but in the context of cardiac sound analysis, noise injection can mimic the presence of murmurs and worsen model performance. Thus, care should be taken to ensure clinically appropriate forms of data augmentation to avoid negatively impacting model performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.