Audio-visual beat tracking based on a state-space model for a music robot dancing with humans

Ohkita, Misato; Bando, Yoshiaki; Ikemiya, Yukara; Itoyama, Katsutoshi; Yoshii, Kazuyoshi

doi:10.1109/iros.2015.7354164

Cited by 13 publications

(13 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, the method keeps all the possibilities of tempos and beat times. If a unique tempo were extracted from music audio signals as in [15], tempo estimation failure would severely degrade the overall performance. We therefore formulate a nonlinear state-space model that has a tempo and a beat time as latent variables and acoustic and skeleton features as observed variables.…”

Section: Proposed Methodsmentioning

confidence: 99%

“…We compared the proposed audio-visual beat-tracking method with two conventional audio beat-tracking methods [5,15]. The method [5] is implemented in HARK [29] robot audition software, and its parameters are set to the default values except for m = 90.…”

Section: Experimental Conditionsmentioning

confidence: 99%

“…The method [5] is implemented in HARK [29] robot audition software, and its parameters are set to the default values except for m = 90. The method [15] is similar to our method except that an audio tempo is uniquely determined in each frame as an acoustic feature. To evaluate the effectiveness of integrating the three kinds of features: onset likelihoods F k , audio tempo likelihoods R k (acoustic features), and visual tempo likelihoods S k (visual features), we tested an audio-based method using only F k and R k as well as a visual-based method using only F k and S k ( Table 1).…”

Section: Experimental Conditionsmentioning

confidence: 99%

“…Such audio-visual integration has often been studied in the music information retrieval (MIR) community, and it has been shown to achieve better performance than singlemodal methods [9][10][11][12][13][14]. The proposed method is an improved version of our previous method [15]. To effectively integrate audio-visual information, it is necessary to extract intermediate features that represent the likelihood of a tempo and that of a beat time.…”

Section: Introductionmentioning

confidence: 99%

“…In each frame, we estimate the likelihood of each tempo and the onset likelihood of the current frame from music audio signals. This method is more advantageous than the previous method [15], which directly and uniquely estimates an audio tempo without allowing for other possibilities. On the other hand, another likelihood of each tempo is also calculated from skeleton information.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Audio-Visual Beat Tracking Based on a State-Space Model for a Robot Dancer Performing with a Human Dancer

Ohkita¹,

Bando²,

Nakamura³

et al. 2017

J. Robot. Mechatron.

Self Cite

View full text Add to dashboard Cite

[abstFig src='/00290001/12.jpg' width='300' text='An overview of real-time audio-visual beat-tracking for music audio signals and human dance moves' ] This paper presents a real-time beat-tracking method that integrates audio and visual information in a probabilistic manner to enable a humanoid robot to dance in synchronization with music and human dancers. Most conventional music robots have focused on either music audio signals or movements of human dancers to detect and predict beat times in real time. Since a robot needs to record music audio signals with its own microphones, however, the signals are severely contaminated with loud environmental noise. To solve this problem, we propose a state-space model that encodes a pair of a tempo and a beat time in a state-space and represents how acoustic and visual features are generated from a given state. The acoustic features consist of tempo likelihoods and onset likelihoods obtained from music audio signals and the visual features are tempo likelihoods obtained from dance movements. The current tempo and the next beat time are estimated in an online manner from a history of observed features by using a particle filter. Experimental results show that the proposed multi-modal method using a depth sensor (Kinect) to extract skeleton features outperformed conventional mono-modal methods in terms of beat-tracking accuracy in a noisy and reverberant environment.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Section: Experimental Conditionsmentioning

confidence: 99%

Section: Experimental Conditionsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Audio-Visual Beat Tracking Based on a State-Space Model for a Robot Dancer Performing with a Human Dancer

Ohkita¹,

Bando²,

Nakamura³

et al. 2017

J. Robot. Mechatron.

Self Cite

View full text Add to dashboard Cite

show abstract

TOM: The Assistant Robotic Tutor of Musicianship with Sound Peak Beat Detection

Hawkins

Chew

2021

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

Most past literatures surrounding musical robots has focused on either the engineering design of physical robots that could produce musical sounds, or on the mixed mechatronics aspects of algorithms enhancement to design robots gaining musicianship skills. The recent combined both these research fields into one is called Robotic Musicianship, which is innovative but with limited literatures. Robot musician involved sound peak beat detection techniques to enable musical perception. Responding to the novelty demand, the study adapted design approach of onset beat detection, blended with positivists' philosophical approaches, and deductive principles associated with these philosophies were adhered to. The study aim to conduct a conceptual design and implement the drum rudiment for a robotic assistant tutor to co-demo electronic snare drum. The design approach of onset beat detection is adapted. The fundamentals of this robot prototype and rudiment work well, although many improvements can be done on the functionality and fluidity, doing so should further enhance engagement and musical development, and achieve the long-term goal to branch out to higher levels of education for music.

show abstract