Optimizing the coverage of a speech database through a selection of representative speaker recordings

Krstulović, Sacha; Bimbot, Frédéric; Boëffard, Olivier; Charlet, Delphine; Fohr, Dominique; Mella, Odile

doi:10.1016/j.specom.2006.07.002

Cited by 7 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The phonological annotation of the Gutenberg corpus comes from the Arctic/ Festvox database (see Kominek and Black 2003), and the annotation of the Le-Monde corpus is a by-product of the Neologos project, detailed by Krstulović et al (2006). For each corpus, we have collected every phoneme, diphoneme, triphoneme, and their occurrences in each sentence so as to define the set U of units to cover and the matrix A.…”

Section: Methodsmentioning

confidence: 99%

“…In François and Boëffard (2001), the methodology gives a priority to the rarest categories of allophones. The latter methodology has been implemented for the definition of the multi-speaker corpus Neologos in Krstulović et al (2006). In the article of Krul et al (2006), the authors constructed a corpus where the distribution of diphonemes/triphonemes matches a uniform distribution.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Large Linguistic Corpus Reduction with SCP Algorithms

Barbot

Boëffard

Chevelu

et al. 2015

Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Linguistic corpus design is a critical concern for building rich annotated corpora useful in different domains of applications. For example, speech technologies such as ASR (Automatic Speech Recognition) or TTS (Text-to-Speech) need a huge amount of speech data to train data-driven models or to produce synthetic speech. Collecting data is always related to costs (recording speech, verifying annotations, etc.), and as a rule of thumb, the more data you gather, the more costly your application will be. Within this context, we present in this article solutions to reduce the amount of linguistic text content while maintaining a sufficient level of linguistic richness required by a model or an application. This problem can be formalized as a Set Covering Problem (SCP) and we evaluate two algorithmic heuristics applied to design large text corpora in English and French for covering phonological information or POS labels. The first considered algorithm is a standard greedy solution with an agglomerative/spitting strategy and we propose a second algorithm based on Lagrangian relaxation. The latter approach provides a lower bound to the cost of each covering solution. This lower bound can be used as a metric to evaluate the quality of a reduced corpus whatever the algorithm applied. Experiments show that a suboptimal algorithm like a greedy algorithm achieves good results; the cost of its solutions is not so far from the lower bound (about 4.35% for 3-phoneme coverings). Usually, constraints in SCP are binary; we proposed here a generalization where the constraints on each covering feature can be multi-valued.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Large Linguistic Corpus Reduction with SCP Algorithms

Barbot

Boëffard

Chevelu

et al. 2015

Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…A phonetically rich and balanced database is required to train the HMM. In order to improve the system performance, the database used to train the acoustic model should be large enough covering all possible inter-speaker and intra-speaker variability (Krstulovic et al 2006). Another important issue is the selection of most basic modeling units representing the salient acoustic and phonetic informations of the language for which the system is to be developed.…”

Section: Acoustic Modelsmentioning

confidence: 99%

Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I)

Aggarwal

Dave

2011

Int J Speech Technol

View full text Add to dashboard Cite

In automatic speech recognition (ASR) systems, the speech signal is captured and parameterized at front end and evaluated at back end using the statistical framework of hidden Markov model (HMM). The performance of these systems depend critically on both the type of models used and the methods adopted for signal analysis. Researchers have proposed a variety of modifications and extensions for HMM based acoustic models to overcome their limitations. In this review, we summarize most of the research work related to HMM-ASR which has been carried out during the last three decades. We present all these approaches under three categories, namely conventional methods, refinements and advancements of HMM. The review is presented in two parts (papers): (i) An overview of conventional methods for acoustic phonetic modeling, (ii) Refinements and advancements of acoustic models. Part I explores the architecture and working of the standard HMM with its limitations. It also covers different modeling units, language models and decoders. Part II presents a review on the advances and refinements of the conventional HMM techniques along with the current challenges and performance issues related to ASR.

show abstract

Selecting Representative Speakers for a Speech Database on the Basis of Heterogeneous Similarity Criteria

Krstulović

Bimbot

Boëffard

et al.

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In the context of the Neologos French speech database creation project, a general methodology was defined for the selection of representative speaker recordings. The selection aims at providing a good coverage in terms of speaker variability while limiting the number of recorded speakers. This is intended to make the resulting database both more adapted to the development of recently proposed multi-model methods and less expensive to collect. The presented methodology proposes a selection process based on the optimization of a quality criterion defined in a variety of speaker similarity modeling frameworks. The selection can be achieved with respect to a unique similarity criterion, using classical clustering methods such as Hierarchical or K-Medians clustering, or it can combine several speaker similarity criteria, thanks to a newly developed clustering method called Focal Speakers Selection. In this framework, four different speaker similarity criteria are tested, and three different speaker clustering algorithms are compared. Results pertaining to the collection of the Neologos database are also discussed

show abstract

Optimizing the coverage of a speech database through a selection of representative speaker recordings

Cited by 7 publications

References 22 publications

Large Linguistic Corpus Reduction with SCP Algorithms

Large Linguistic Corpus Reduction with SCP Algorithms

Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I)

Selecting Representative Speakers for a Speech Database on the Basis of Heterogeneous Similarity Criteria

Contact Info

Product

Resources

About