International audienceThe aim of this contribution is to propose a database model designed for the storage and accessibility of various speech disorder data including signals, clinical evaluations and patients' information. This model is the result of 15 years of experience in the management and the analysis of this type of data. We present two important French corpora of voice and speech disorders that we have been recording in hospitals in Marseilles (MTO corpus) and Aix-en-Provence (AHN corpus). The population consists of 2500 dysphonic, dysarthric and control subjects, a number of speakers which, as far as we know, constitutes currently one of the largest corpora of " pathological " speech. The originality of this data lies in the presence of physiological data (such as oral airflow or estimated sub-glottal pressure) associated with acoustic recordings. This activity led us to raise the question of how we can manage the sound, physiological and clinical data of such a large quantity of data. Consequently, we developed a database model that we present here. Recommendations and technical solutions based on MySQL, a relational database management system, are discussed
Within the framework of the Carcinologic Speech Severity Index (C2SI) INCaProject, we collected a large database of French speech recordings aiming at validatingDisorder Severity Indexes. Such a database will be useful for measuring the impact of oraland pharyngeal cavity cancer on speech production. It will permit to assess patients Qualityof Life after treatment. The database is composed of audio recordings from 134 sessions andassociated metadata. Several intelligibility and comprehensibility levels of speech functionshave been evaluated. Acoustics and prosody have been assessed. Perceptual evaluation ratesfrom both naive and expert juries are being produced. Automatic analyzes are being carriedout. It is intended to provide speech therapists and physicians with objective tools, whichtake into account the intelligibility and comprehensibility of patients which received cancertreatment (surgery and/or radiotherapy and/or chemotherapy). The aim of this paper is tojustify the necessity of such a corpus and to present its data collection. This C2SI corpus willbe available to the scientific community through the Scientific Interest Group Parolotheque.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.