The aim of the MBROLA project, recently initiated by the Faculté Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. The ultimate goal is to boost up academic research on speech synthesis, and particularly on prosody generation, known as one of the biggest challenges taken up by Text-to-Speech synthesizers for the years to come.Central to the MBROLA project is MBROLA 2.00, a speech synthesizer based on the concatenation of diphones. Executable files of this synthesizer have been made freely available for many computers/operating systems, as well as a first diphone database for a French male voice.We describe here the terms of participation to the project, as a user, as an associated developer, or as a database provider.
BackgroundFor two decades, EEG-based Brain-Computer Interface (BCI) systems have been widely studied in research labs. Now, researchers want to consider out-of-the-lab applications and make this technology available to everybody. However, medical-grade EEG recording devices are still much too expensive for end-users, especially disabled people. Therefore, several low-cost alternatives have appeared on the market. The Emotiv Epoc headset is one of them. Although some previous work showed this device could suit the customer’s needs in terms of performance, no quantitative classification-based assessments compared to a medical system are available.MethodsThis paper aims at statistically comparing a medical-grade system, the ANT device, and the Emotiv Epoc headset by determining their respective performances in a P300 BCI using the same electrodes. On top of that, a review of previous Emotiv studies and a discussion on practical considerations regarding both systems are proposed. Nine healthy subjects participated in this experiment during which the ANT and the Emotiv systems are used in two different conditions: sitting on a chair and walking on a treadmill at constant speed.ResultsThe Emotiv headset performs significantly worse than the medical device; observed effect sizes vary from medium to large. The Emotiv headset has higher relative operational and maintenance costs than its medical-grade competitor.ConclusionsAlthough this low-cost headset is able to record EEG data in a satisfying manner, it should only be chosen for non critical applications such as games, communication systems, etc. For rehabilitation or prosthesis control, this lack of reliability may lead to serious consequences. For research purposes, the medical system should be chosen except if a lot of trials are available or when the Signal-to-Noise Ratio is high. This also suggests that the design of a specific low-cost EEG recording system for critical applications and research is still required.
The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the Glottal Closure Instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual Excitation And a Meanbased Signal (SEDREAMS) and the Yet Another GCI Algorithm (YAGA). The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy. Their robustness to additive noise and to reverberation is also assessed. A further contribution of the paper is the evaluation of their performance on a concrete application of speech processing: the causal-anticausal decomposition of speech. It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy. ZFR and SEDREAMS also show a superior robustness to additive noise and reverberation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.