2008
DOI: 10.1109/icpr.2008.4760962
|View full text |Cite
|
Sign up to set email alerts
|

Clip retrieval using multi-modal biometrics in meeting archives

Abstract: We present a system to retrieve all clips from a meeting archive that show a particular individual speaking, using a single face or voice sample as the query. The system incorporates three novel ideas. One, rather than match the query to each individual sample in the archive, samples within a meeting are grouped first, generating a cluster of samples per individual. The query is then matched to the cluster, taking advantage of multiple samples to yield a robust decision. Two, automatic audio-visual association… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3391817 recognition [31], movie genre recognition [31], [32], video summarization [40], tracking [5], [21], [22], [33], video search & retrieval [36].…”
Section: A Video Recognition or Audio-visual Video Parsing (Avvp)mentioning
confidence: 99%
See 1 more Smart Citation
“…This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3391817 recognition [31], movie genre recognition [31], [32], video summarization [40], tracking [5], [21], [22], [33], video search & retrieval [36].…”
Section: A Video Recognition or Audio-visual Video Parsing (Avvp)mentioning
confidence: 99%
“…Vajaria et al proposed various approaches for speaker localization [35] and used them to solve speaker clip retrieval [36] and speaker diarization [37]. They used the Bayesian information criterion to segment feature vectors and graph spectral partitioning to cluster the segments of a speaker in a video clip.…”
Section: ) Pre-deep Learningmentioning
confidence: 99%