A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds

Álías, Francesc; Socoró, Joan Claudi; Sevillano, Xavier

doi:10.3390/app6050143

Cited by 172 publications

(100 citation statements)

References 196 publications

(240 reference statements)

Supporting

Mentioning

100

Contrasting

Order By: Relevance

“…The database was then used to evaluate a sound event classifier, which obtained better accuracies with foreground sound events than those perceived in the background. Hence, labelling the acoustic salience of audio events could permit more detailed sensitivity analyses of machine hearing approaches [33].…”

Section: Salience Of Environmental Acoustic Eventsmentioning

confidence: 99%

“…This is of special importance when it comes to delimiting the boundaries of each sound event, i.e., determining the start and end points of each sound event in the mixed audio data [15,17] and the event/background ratio salience [24][25][26]. In the context of environmental sounds, it is important to highlight that acoustic events are usually disconnected from one another, which contrasts with speech or music where a strongly interconnected temporal structure of basic units is present (phonemes and notes, respectively) [33].…”

Section: Salience Of Environmental Acoustic Eventsmentioning

confidence: 99%

See 1 more Smart Citation

Description of Anomalous Noise Events for Reliable Dynamic Traffic Noise Mapping in Real-Life Urban and Suburban Soundscapes

Álías

Socoró

2017

Applied Sciences

Self Cite

View full text Add to dashboard Cite

Abstract:Traffic noise is one of the main pollutants in urban and suburban areas. European authorities have driven several initiatives to study, prevent and reduce the effects of exposure of population to traffic. Recent technological advances have allowed the dynamic computation of noise levels by means of Wireless Acoustic Sensor Networks (WASN) such as that developed within the European LIFE DYNAMAP project. Those WASN should be capable of detecting and discarding non-desired sound sources from road traffic noise, denoted as anomalous noise events (ANE), in order to generate reliable noise level maps. Due to the local, occasional and diverse nature of ANE, some works have opted to artificially build ANE databases at the cost of misrepresentation. This work presents the production and analysis of a real-life environmental audio database in two urban and suburban areas specifically conceived for anomalous noise events' collection. A total of 9 h 8 min of labelled audio data is obtained differentiating among road traffic noise, background city noise and ANE. After delimiting their boundaries manually, the acoustic salience of the ANE samples is automatically computed as a contextual signal-to-noise ratio (SNR). The analysis of the real-life environmental database shows high diversity of ANEs in terms of occurrences, durations and SNRs, as well as confirming both the expected differences between the urban and suburban soundscapes in terms of occurrences and SNRs, and the rare nature of ANE.

show abstract

Section: Salience Of Environmental Acoustic Eventsmentioning

confidence: 99%

Section: Salience Of Environmental Acoustic Eventsmentioning

confidence: 99%

Description of Anomalous Noise Events for Reliable Dynamic Traffic Noise Mapping in Real-Life Urban and Suburban Soundscapes

Álías

Socoró

2017

Applied Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…After extracting the features from the raw dataset, such features contain important information that is used by the learning algorithms for the activities discrimination. The most common methods of feature extraction work in time, frequency, and discrete domains [192]. Among time domain method, mean and standard deviation are the key approaches for almost all sensor types.…”

Section: ) Dimensionality Reductionmentioning

confidence: 99%

Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey

et al. 2019

View full text Add to dashboard Cite

Future buildings will offer new convenience, comfort, and efficiency possibilities to their residents. Changes will occur to the way people live as technology involves into people's lives and information processing is fully integrated into their daily living activities and objects. The future expectation of smart buildings includes making the residents' experience as easy and comfortable as possible. The massive streaming data generated and captured by smart building appliances and devices contains valuable information that needs to be mined to facilitate timely actions and better decision making. Machine learning and big data analytics will undoubtedly play a critical role to enable the delivery of such smart services. In this paper, we survey the area of smart building with a special focus on the role of techniques from machine learning and big data analytics. This survey also reviews the current trends and challenges faced in the development of smart building services.

show abstract

“…For an up-to-date review of feature extraction techniques in music we refer the reader to the study by Al ıas et al (2016).…”

Section: Feature Extractionmentioning

confidence: 99%

Predicting the perception of performed dynamics in music audio with ensemble learning

Elowsson

Friberg

2017

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

By varying the dynamics in a musical performance, the musician can convey structure and different expressions. Spectral properties of most musical instruments change in a complex way with the performed dynamics, but dedicated audio features for modeling the parameter are lacking. In this study, feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux. Previously, ground truths ratings of performed dynamics had been collected by asking listeners to rate how soft/loud the musicians played in a set of audio files. The ratings, averaged over subjects, were used to train three different machine learning models, using the audio features developed for the study as input. The highest result was produced from an ensemble of multilayer perceptrons with an R 2 of 0.84. This result seems to be close to the upper bound, given the estimated uncertainty of the ground truth data. The result is well above that of individual human listeners of the previous listening experiment, and on par with the performance achieved from the average rating of six listeners. Features were analyzed with a factorial design, which highlighted the importance of source separation in the feature extraction.

show abstract

A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds

Cited by 172 publications

References 196 publications

Description of Anomalous Noise Events for Reliable Dynamic Traffic Noise Mapping in Real-Life Urban and Suburban Soundscapes

Description of Anomalous Noise Events for Reliable Dynamic Traffic Noise Mapping in Real-Life Urban and Suburban Soundscapes

Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey

Predicting the perception of performed dynamics in music audio with ensemble learning

Contact Info

Product

Resources

About