Yukara Ikemiya scite author profile

This paper presents an audio-visual beat-tracking method for an entertainment robot that can dance in synchronization with music and human dancers. Conventional music robots have focused on either music audio signals or dancing movements of humans for detecting and predicting beat times in real time. Since a robot needs to record music audio signals by using its own microphones, however, the signals are severely contaminated with loud environmental noise and reverberant sounds. Moreover, it is difficult to visually detect beat times from real complicated dancing movements that exhibit weaker repetitive characteristics than music audio signals do. To solve these problems, we propose a state-space model that integrates both audio and visual information in a probabilistic manner. At each frame, the method extracts acoustic features (audio tempos and onset likelihoods) from music audio signals and extracts skeleton features from movements of a human dancer. The current tempo and the next beat time are then estimated from those observed features by using a particle filter. Experimental results showed that the proposed multi-modal method using a depth sensor (Kinect) for extracting skeleton features outperformed conventional mono-modal methods by 0.20 (F measure) in terms of beat-tracking accuracy in a noisy and reverberant environment.

show abstract

Singing Voice Separation and Vocal F0 Estimation Based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation

Ikemiya

Itoyama

Yoshii

2016

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Transferring Vocal Expression of F0 Contour Using Singing Voice Synthesizer

Ikemiya

Itoyama

Okuno

2014

View full text Add to dashboard Cite

Abstract.A system for transferring vocal expressions separately from singing voices with accompaniment to singing voice synthesizers is described. The expressions appear as fluctuations in the fundamental frequency contour of the singing voice, such as vibrato, glissando, and kobushi. The fundamental frequency contour of the singing voice is estimated using the subharmonic summation in a limited frequency range and aligned temporally to chromatic pitch sequence. Each expression is transcribed and parameterized in accordance with designed rules. Finally, the expressions are transferred to given scores on the singing voice synthesizer. Experiments demonstrated that the proposed system can transfer the vocal expressions while retaining singer's individuality on two singing voice synthesizers: the Vocaloid and the CeVIO.

show abstract

Transcribing vocal expression from polyphonic music

Ikemiya

Itoyama

Okuno

2014

View full text Add to dashboard Cite

A method for transcribing vocal expressions such as vibrato, glissando, and kobushi separately from polyphonic music is described. The expressions appear as fluctuation in the fundamental frequency contour of the singing voice. They can be used for search and retrieval of music and for expressive singing voice synthesis based on singing style since they strongly reflect the individuality of the singer. The fundamental frequency contour of the singing voice is estimated using the Viterbi algorithm with limitation from a corresponding note sequence. Next, the notes are aligned with the fundamental frequency sequence temporally. Finally, each expression is identified and parameterized in accordance with designed rules. Experiments demonstrated that this method can transcribe expressions in the singing voice from commercial recordings.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yukara Ikemiya

Singing voice analysis and editing based on mutually dependent F0 estimation and source separation

Audio-visual beat tracking based on a state-space model for a music robot dancing with humans

Singing Voice Separation and Vocal F0 Estimation Based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation

Transferring Vocal Expression of F0 Contour Using Singing Voice Synthesizer

Transcribing vocal expression from polyphonic music

Contact Info

Product

Resources

About