The duration of a speech passage can he altered using audio time-scale modification techniques. Time-scale modification can be achieved in the time domain by segmenting the input signal into overlapping frames and recombining the frames with an overlap differing from the analysis overlap. We present a time-scale modification algorithm that uses a simple peak alignment technique to synchronize overlapping synthesis frames. The peak alignment overlap-add (PAOLA) algorithm also takes advantage of waveform properties to ensure a high quality output for the minimum number of iterations. The new algorithm produces a time-scaled output of approximately equal quality to that of an adaptive implementation of the commercially popular synchronised overlap-add (SOLA) algorithm, hut offers a computational saving ranging from a factor of 15 (for a time-scale factor of 0.5) to 170 (for a time-scale factor of 1.1).
Time-domain time-scaling algorithms are efficient in comparison to their frequency-domain counterparts, but they rely upon the existence of a quasi-periodic signal to produce a high quality output. This requirement makes them unsuitable for use on multi-pitched signals such as polyphonic music. However, time-domain techniques applied on a subband basis can resolve the multi-pitch problem. We propose an improved subband implementation based upon the bark scale for the timescale modification of music. The new subband approach is supported by psychoacoustic and music theory and subjectively through informal listening tests.
An approach is presented which generates an audio thumbnail of Irish Traditional music. An audio thumbnail is considered to be the most representative segment of the music. For popular music, the chorus is considered to be an ideal audio thumbnail, however in Irish Traditional music there is no chorus. An Irish Traditional tune consists of two or more short structural segments called parts. Parts are repeated to extend the tune, and the tune itself is also repeated once or more in its entirety. To further extend a performance, tunes are concatenated to form a set of tunes. As a result, there is plenty of repetition within this music type. The presented approach utilises an existing approach which calculates the structure of Irish Traditional Music. The structural information is used to extract a single rendition of each distinctive part. The resulting parts are concatenated to form the audio thumbnail.
A framework is presented which addresses the issues related to the real-time implementation of synchronised video and audio timescale and pitch-scale modification algorithms. It allows for seamless real-time transition between continually varying, independent timescale and pitch-scale parameters arising as a result of manual or automatic intervention. We illuminate the problems which arise in a real-time context as well as provide novel solutions to prevent artefacts, minimise latency, and improve synchronisation. The time and pitch scaling approach is based on a modified phase vocoder with optional phase locking and an integrated transient detector which enables high quality transient preservation in real-time. A novel method for audio/visual synchronisation was implemented in order to ensure no perceptible latency between audio and video while real-time time scaling and pitch shifting is applied. Evaluation results are reported which demonstrate both high audio quality and minimal synchronisation error.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.