Abstract-Several intrusive measures of reverberation can be computed from measured and simulated room impulse responses, over the full frequency band or for each individual mel-frequency subband. It is initially shown that full-band clarity index C50 is the most correlated measure on average with reverberant speech recognition performance. This corroborates previous findings but now for the dataset to be used in this study. We extend the previous findings to show that C50 also exhibits the highest mutual information on average. Motivated by these extended findings, a non-intrusive room acoustic (NIRA) estimation method is proposed to estimate C50 from only the reverberant speech signal. The NIRA method is a data-driven approach based on computing a number of features from the speech signal and it employs these features to train a model used to perform the estimation. The choice of features and learning techniques are explored in this work using an evaluation set which comprises approximately 100000 different reverberant signals (around 93 hours of speech) including reverberation from measured and simulated room impulse responses. The feature importance of each feature with respect to the estimation of the target C50 is analysed following two different approaches. In both cases the newly chosen set of features shows high importance for the target. The best C50 estimator provides a root mean square deviation around 3 dB on average for all reverberant test environments.
INTRODUÇÃO No início das atividades de reflorestamento, no Brasil, poucas empresas utilizavam a mecanização nas operações de colheita florestal. A partir da década de 1990, com a abertura das importações, o aumento do ABSTRACT RESUMO custo da mão de obra e a necessidade de se executar o trabalho de forma mais ergonômica e com maior eficiên-cia, as empresas iniciaram a mecanização das operações de colheita de forma mais intensiva (Machado 2008).
This paper presents a formant frequency tracking algorithm for continuous speech processing. First, it uses spectral information for generating frequency candidates. For this purpose, the roots of the polynomial of a Linear Predictive Coding (LPC) and peak picking of Chirp Group Delay Function (CGD) were tested. The second stage is a beamsearch algorithm that tries to find the best sequence of formants given the frequency candidates, applying a cost function based on local and global evidences. The main advantage of this beam-search algorithm compared with previous dynamic programming approaches lies in that a trajectory function that takes into account several frames can be optimally incorporated to the cost function. The performance was evaluated using a labeled formant database and the Wavesurfer formant tracker, achieving promising results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.