ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
DOI: 10.1109/icassp49357.2023.10094714
|View full text |Cite
|
Sign up to set email alerts
|

The Ustc System for Adress-m Challenge

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 4 publications
0
0
0
Order By: Relevance
“…Considering the small size of the ADReSS-M data set and the fact that the picture descriptions were different in the training and test sets (not only in language but also in content, as the pictures were different), we expected that the proposed models would rely on more abstract acoustic features rather than on lexical or structural linguistic features, as the former are presumably less language-dependent than the latter [ 26 ], [ 59 ], [ 60 ]. Indeed this was the case for most submissions, as four of the top-scoring teams [ 46 ], [ 49 ], [ 50 ], [ 51 ] employed acoustic features exclusively (even though in some cases ASR output was employed to derive dysfluency and pause features). However, some of the submitted models, including one of the top-5 [ 52 ] employed linguistic features, either by themselves or in combination with acoustic and paralinguistic features.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Considering the small size of the ADReSS-M data set and the fact that the picture descriptions were different in the training and test sets (not only in language but also in content, as the pictures were different), we expected that the proposed models would rely on more abstract acoustic features rather than on lexical or structural linguistic features, as the former are presumably less language-dependent than the latter [ 26 ], [ 59 ], [ 60 ]. Indeed this was the case for most submissions, as four of the top-scoring teams [ 46 ], [ 49 ], [ 50 ], [ 51 ] employed acoustic features exclusively (even though in some cases ASR output was employed to derive dysfluency and pause features). However, some of the submitted models, including one of the top-5 [ 52 ] employed linguistic features, either by themselves or in combination with acoustic and paralinguistic features.…”
Section: Discussionmentioning
confidence: 99%
“…The team that came in second place employed a mixed-batch transfer learning approach for both tasks, applied to eGeMAPS acoustic features [ 49 ]. The third highest scoring team explored a wider number of acoustic feature extraction methods, employing an XGBoost classifier for the classification task and SVM and XGBoost regressors for MMSE prediction [ 50 ]. The fourth ranked team employed an automatic speech recognition system to derive speech intelligibility features based on confidence scores assigned by the system, which along with word-level duration and pause features formed the input for logistic regression and SVM regression models for tasks 1 and 2, respectively [ 51 ].…”
Section: Rank Of Submissionsmentioning
confidence: 99%
See 1 more Smart Citation