A neural network approach to audio-assisted movie dialogue detection

Kotti, Margarita; Benetos, Emmanouil; Kotropoulos, Constantine; Pitas, Ioannis

doi:10.1016/j.neucom.2007.08.006

Cited by 18 publications

(12 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of dialogue detection of the proposed system is compared to the performance of a system that uses the ground truth indicator functions in both the training and the test phases [14]. In [14], two splits of the ground truth indicator functions between the training and the test set are examined, namely the 70%/30% training/test split and the 50%/50% training/test split.…”

Section: E Discussionmentioning

confidence: 99%

“…In [14], two splits of the ground truth indicator functions between the training and the test set are examined, namely the 70%/30% training/test split and the 50%/50% training/test split. Concerning the RBF networks, for the 70%/30% split CCI is 0.872, while for the 50%/50% split CCI is 0.848.…”

Section: E Discussionmentioning

confidence: 99%

“…The particular choice of the duration for the time window is justified in [14]. In short, after modeling the empirical distribution of the actor utterance duration, it is found that it is the Inverse Gaussian with expected value equal to 5 s. This means that actor changes are expected to occur, on average, every 5 s. We consider that four actor changes should occur within the time-window employed in our analysis on average.…”

Section: A Databasementioning

confidence: 99%

“…Preliminary results on audio-assisted movie dialogue detection are described in [14] that resort to actor indicator functions. An actor indicator function defines if an actor speaks at a certain time instant.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Audio-Assisted Movie Dialogue Detection

Kotti

Ververidis

Evangelopoulos

et al. 2008

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Abstract-An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the crosscorrelation and the magnitude of the corresponding the crosspower spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptrons, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported.

show abstract

Section: E Discussionmentioning

confidence: 99%

Section: E Discussionmentioning

confidence: 99%

Section: A Databasementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Audio-Assisted Movie Dialogue Detection

Kotti

Ververidis

Evangelopoulos

et al. 2008

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…The better the theoretical cdf approximates the empirical one, the closer the points (q i , F k ( t(i)−µ σ )) are to the diagonal [29]. To validate that IG distribution generally fits best the empirical speaker utterance duration distribution, another dataset has been employed, to be referred to as the movie dataset [30]. In this dataset, 25 audio recordings are included that have been extracted from six movies of different genres, namely: Analyze That, Cold Mountain, Jackie Brown, Lord of the Rings I, Platoon, and Secret Window.…”

Section: A Distribution Of Speaker Utterance Durationsmentioning

confidence: 99%

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Kotti

Benetos

Kotropoulos

2008

IEEE Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

An algorithm for automatic speaker segmentation based on the Bayesian Information Criterion (BIC) is presented. BIC tests are not performed for every window shift (e.g. every milliseconds), as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches.

show abstract

Detection of Dialogue in Movie Soundtrack for Speech Intelligibility Enhancement

Lopatka

2014

Communications in Computer and Information Science

View full text Add to dashboard Cite

A neural network approach to audio-assisted movie dialogue detection

Cited by 18 publications

References 16 publications

Audio-Assisted Movie Dialogue Detection

Audio-Assisted Movie Dialogue Detection

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Detection of Dialogue in Movie Soundtrack for Speech Intelligibility Enhancement

Contact Info

Product

Resources

About