A Tutorial on Auditory Attention Identification Methods

Aličković, Emina; Lunner, Thomas; Gustafsson, Fredrik; Ljung, Lennart

doi:10.3389/fnins.2019.00153

Cited by 65 publications

(75 citation statements)

References 94 publications

(120 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Identifying the degree and direction of attention in near realtime requires that this information can be extracted from short time intervals. Several studies have shown that attention can be reliably decoded from single-trial EEG data in the two competing speaker paradigm (Horton et al, 2014;Mirkovic et al, 2015Mirkovic et al, , 2016O'Sullivan et al, 2015;Biesmans et al, 2017;Fiedler et al, 2017;Fuglsang et al, 2017Fuglsang et al, , 2020Haghighi et al, 2017) using various auditory attention decoding (AAD) methods (for a review see: Alickovic et al, 2019). In these studies, AAD procedures demonstrated above chance-level accuracy for evaluation periods of time ranging from 2 to 60 s. In a neurofeedback application, features should be obtained as quickly as possible.…”

Section: Introductionmentioning

confidence: 99%

Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening

et al. 2020

View full text Add to dashboard Cite

Listeners differ in their ability to attend to a speech stream in the presence of a competing sound. Differences in speech intelligibility in noise cannot be fully explained by the hearing ability which suggests the involvement of additional cognitive factors. A better understanding of the temporal fluctuations in the ability to pay selective auditory attention to a desired speech stream may help in explaining these variabilities. In order to better understand the temporal dynamics of selective auditory attention, we developed an online auditory attention decoding (AAD) processing pipeline based on speech envelope tracking in the electroencephalogram (EEG). Participants had to attend to one audiobook story while a second one had to be ignored. Online AAD was applied to track the attention toward the target speech signal. Individual temporal attention profiles were computed by combining an established AAD method with an adaptive staircase procedure. The individual decoding performance over time was analyzed and linked to behavioral performance as well as subjective ratings of listening effort, motivation, and fatigue. The grand average attended speaker decoding profile derived in the online experiment indicated performance above chance level. Parameters describing the individual AAD performance in each testing block indicated significant differences in decoding performance over time to be closely related to the behavioral performance in the selective listening task. Further, an exploratory analysis indicated that subjects with poor decoding performance reported higher listening effort and fatigue compared to good performers. Taken together our results show that online EEG based AAD in a complex listening situation is feasible. Adaptive attended speaker decoding profiles over time could be used as an objective measure of behavioral performance and listening effort. The developed online processing pipeline could also serve as a basis for future EEG based near real-time auditory neurofeedback systems.

show abstract

Section: Introductionmentioning

confidence: 99%

Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Several recent magnetoencephalographic and electroencephalographic (EEG) studies have shown that neural responses during auditory selectivity tasks correlate more strongly with attended than with ignored speech (e.g., Ding and Simon, 2012 ; Horton et al, 2013 ; O’Sullivan et al, 2015 ). Auditory attention decoding models have been established to describe the relationship between continuous speech and ongoing cortical recordings ( Alickovic et al, 2019 ). The linear temporal response function (TRF) model ( Crosse et al, 2016 ) has been used widely to predict EEG responses to speech (i.e., the encoding model; e.g., Di Liberto et al, 2015 ) and to reconstruct speech from associated EEG signals (i.e., the decoding model; e.g., Ding and Simon, 2012 ; Mirkovic et al, 2015 ; Teoh and Lalor, 2019 ) using off-line regression techniques.…”

Section: Introductionmentioning

confidence: 99%

“…As the relative intensity of background interference may affect the quality of neural tracking of attended speech ( Alickovic et al, 2019 ), the impacts of various SNR conditions on auditory attention decoding in realistic scenarios should also be considered. Generally, low SNRs interfere with attended speech segregation, and the quality of neural tracking of attended speech declines with increasing noise intensity ( Kong et al, 2014 ; Das et al, 2018 ).…”

Section: Introductionmentioning

confidence: 99%

“…Then, the TRF was used to describe the relationships between EEG signals and the features of attended and ignored speech (i.e., the temporal envelopes of intact speech and high-RMS-level segments alone) under different SNR conditions ( Crosse et al, 2016 ). Auditory attention decoding performance was assessed by examining correlations between reconstructed and actual speech signals ( Alickovic et al, 2019 ). We hypothesized that: (1) the neural-tracking activities reflected by TRF responses with intact temporal envelopes would be stronger than that with high-RMS-level segments, as the high-RMS-level segments only contain a part of acoustic information; (2) the attention decoding accuracy with high-RMS-level speech segments would not be inferior to that with intact speech in the presence of background interference, as the high-RMS-level segments would carry sufficient information for the segregation and perception of attended speech; and (3) the top-down modulation of auditory attention would facilitate the classification between attended and ignored speech based on the cortical tracking ability, and be insensitive to the change of noise level.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions

Wang

Chen

2020

Front. Hum. Neurosci.

View full text Add to dashboard Cite

The attended speech stream can be detected robustly, even in adverse auditory scenarios with auditory attentional modulation, and can be decoded using electroencephalographic (EEG) data. Speech segmentation based on the relative root-mean-square (RMS) intensity can be used to estimate segmental contributions to perception in noisy conditions. High-RMS-level segments contain crucial information for speech perception. Hence, this study aimed to investigate the effect of high-RMS-level speech segments on auditory attention decoding performance under various signal-to-noise ratio (SNR) conditions. Scalp EEG signals were recorded when subjects listened to the attended speech stream in the mixed speech narrated concurrently by two Mandarin speakers. The temporal response function was used to identify the attended speech from EEG responses of tracking to the temporal envelopes of intact speech and high-RMS-level speech segments alone, respectively. Auditory decoding performance was then analyzed under various SNR conditions by comparing EEG correlations to the attended and ignored speech streams. The accuracy of auditory attention decoding based on the temporal envelope with high-RMS-level speech segments was not inferior to that based on the temporal envelope of intact speech. Cortical activity correlated more strongly with attended than with ignored speech under different SNR conditions. These results suggest that EEG recordings corresponding to high-RMS-level speech segments carry crucial information for the identification and tracking of attended speech in the presence of background noise. This study also showed that with the modulation of auditory attention, attended speech can be decoded more robustly from neural activity than from behavioral measures under a wide range of SNR.

show abstract

“…To illustrate why and how the MESD is useful in the evaluation of AAD algorithms, we apply it to an illustrative example in which we compare two variants of the MMSE decoder for AAD as proposed in [6] and [10], respectively. 1) Description of the two variants: Given a training set of M data windows, in the first variant of [6] (also adopted in, e.g., [8]), per-window (corresponding to decision window length τ ) decoders are computed, after which the M decoders are averaged to obtain one final decoder. The second variant of [10] (also adopted in, e.g., [12], [17], [18]), first averages the M per-window autocorrelation matrices (or equivalently: the windows are all concatenated) to train a single decoder across all training windows simultaneously.…”

Section: B Illustrative Example: Mesd-based Performance Evaluationmentioning

confidence: 99%

An Interpretable Performance Metric for Auditory Attention Decoding Algorithms in a Context of Neuro-Steered Gain Control

Geirnaert

Francart

Bertrand

2019

Preprint

View full text Add to dashboard Cite

In a multi-speaker scenario, a hearing aid lacks information on which speaker the user intends to attend, and therefore it often mistakenly treats the latter as noise while enhancing an interfering speaker. Recently, it has been shown that it is possible to decode the attended speaker from the brain activity, e.g., recorded by electroencephalography sensors. While numerous of these auditory attention decoding (AAD) algorithms appeared in the literature, their performance is generally evaluated in a non-uniform manner. Furthermore, AAD algorithms typically introduce a trade-off between the AAD accuracy and the time needed to make an AAD decision, which hampers an objective benchmarking as it remains unclear which point in each algorithm's trade-off space is the optimal one in a context of neuro-steered gain control. To this end, we present an interpretable performance metric to evaluate AAD algorithms, based on an adaptive gain control system, steered by AAD decisions. Such a system can be modeled as a Markov chain, from which the minimal expected switch duration (MESD) can be calculated and interpreted as the expected time required to switch the operation of the hearing aid after an attention switch of the user, thereby resolving the trade-off between AAD accuracy and decision time. Furthermore, we show that the MESD calculation provides an automatic and theoretically founded procedure to optimize the number of gain levels and decision time in an AADbased adaptive gain control system.

show abstract

A Tutorial on Auditory Attention Identification Methods

Cited by 65 publications

References 94 publications

Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening

Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening

Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions

An Interpretable Performance Metric for Auditory Attention Decoding Algorithms in a Context of Neuro-Steered Gain Control

Contact Info

Product

Resources

About