1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) 1999
DOI: 10.1109/icassp.1999.757481
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised clustering of ambulatory audio and video

Abstract: A truly personal and reactive computer system should have access to the same information as its user, including the ambient sights and sounds. To this end, we h a ve developed a system for extracting events and scenes from natural audio visual input. We nd our system can without any prior labeling of data cluster the audio visual data into events, such as passing through doors and crossing the street. Also, we hierarchically cluster these events into scenes and get clusters that correlate with visiting the sup… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
61
0
1

Year Published

2005
2005
2009
2009

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 94 publications
(62 citation statements)
references
References 4 publications
0
61
0
1
Order By: Relevance
“…This scenario typically uses one camera and/or audio recorder, with either a fixed installation such as close-circuit surveillance [133] or moving in space such as unmanned aerial vehicle (UAV) [88] or lifelogs [28], [42], [133]. The scope of data analysis is within a start/stop of the recording device.…”
Section: ) Single Stream From One Continuous Takementioning
confidence: 99%
See 1 more Smart Citation
“…This scenario typically uses one camera and/or audio recorder, with either a fixed installation such as close-circuit surveillance [133] or moving in space such as unmanned aerial vehicle (UAV) [88] or lifelogs [28], [42], [133]. The scope of data analysis is within a start/stop of the recording device.…”
Section: ) Single Stream From One Continuous Takementioning
confidence: 99%
“…A continuous media sequence can be either presegmented into fixedlength units or jointly clustered and segmented by generative models typically in the HMM/DBN family. Clarkson and Pentland [28] cluster ambulatory audiovisual streams with HMMs to identify different user locations. Xie et al [148] find that recurrent frames and shot sequences in sports and news programs often correspond to domain-specific multilevel motifs found using hierarchical HMM.…”
Section: Unsupervised Event Discoverymentioning
confidence: 99%
“…The idea of a "life-log" or a personal digital archive is a notion that can be traced back at least 60 years [16]. Since then a variety of modern projects have spawned such as the Remembrance Agent [17], the Familiar [18] [19], myLifeBits [20], Memories for Life [21] and What Was I Thinking [22]. In [23] the authors evaluate the user's context in real time and then use variables like current location, activity, and social interaction to predict moments of interest.…”
Section: Introductionmentioning
confidence: 99%
“…Clarkson and Pentland studied user context awareness using audio, video, and other sensory streams [2] [3], [4], [5], [6] in the context of a system designed to extract personal life patterns from sensory data. This system employed featurelevel fusion and HMM clustering techniques to learn common scenarios in everyday life.…”
Section: Related Workmentioning
confidence: 99%
“…We used seven different training configurations to test the effect of varying numbers of parameters. Specifically, we trained models with 2,4,8,12,16,20, and 24 hidden units or gaussians. Each autoencoder was initialized with random weights between -0.05 and 0.05 and trained on the whitened features for 100 iterations using a batch-mode backpropagation algorithm with an adaptive learning rate initialized at 0.05, a momentum term of 0.045, and batch shuffling.…”
Section: Trainingmentioning
confidence: 99%