2007
DOI: 10.1016/j.neucom.2007.08.006
|View full text |Cite
|
Sign up to set email alerts
|

A neural network approach to audio-assisted movie dialogue detection

Abstract: A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated. An indicator function defines that an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and the magnitude of the corresponding cross-power spectral density are fed as input to neural networks for dialogue detection. Several types of artificial neural networks, including multilayer perceptrons, voted perceptrons, radial basis funct… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
11
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 16 publications
1
11
0
Order By: Relevance
“…The performance of dialogue detection of the proposed system is compared to the performance of a system that uses the ground truth indicator functions in both the training and the test phases [14]. In [14], two splits of the ground truth indicator functions between the training and the test set are examined, namely the 70%/30% training/test split and the 50%/50% training/test split.…”
Section: E Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…The performance of dialogue detection of the proposed system is compared to the performance of a system that uses the ground truth indicator functions in both the training and the test phases [14]. In [14], two splits of the ground truth indicator functions between the training and the test set are examined, namely the 70%/30% training/test split and the 50%/50% training/test split.…”
Section: E Discussionmentioning
confidence: 99%
“…In [14], two splits of the ground truth indicator functions between the training and the test set are examined, namely the 70%/30% training/test split and the 50%/50% training/test split. Concerning the RBF networks, for the 70%/30% split CCI is 0.872, while for the 50%/50% split CCI is 0.848.…”
Section: E Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The better the theoretical cdf approximates the empirical one, the closer the points (q i , F k ( t(i)−µ σ )) are to the diagonal [29]. To validate that IG distribution generally fits best the empirical speaker utterance duration distribution, another dataset has been employed, to be referred to as the movie dataset [30]. In this dataset, 25 audio recordings are included that have been extracted from six movies of different genres, namely: Analyze That, Cold Mountain, Jackie Brown, Lord of the Rings I, Platoon, and Secret Window.…”
Section: A Distribution Of Speaker Utterance Durationsmentioning
confidence: 99%