This paper presents a novel computationally efficient voice activity detection (VAD) algorithm and emphasizes the importance of such algorithms in distributed speech recognition (DSR) systems. When using VAD algorithms in telecommunication systems, the required capacity of the speech transmission channel can be reduced if only the speech parts of the signal are transmitted. A similar objective can be adopted in DSR systems, where the nonspeech parameters are not sent over the transmission channel. A novel approach is proposed for VAD decisions based on mel-filter bank (MFB) outputs with the so-called Hangover criterion. Comparative tests are presented between the presented MFB VAD algorithm and three VAD algorithms used in the G.729, G.723.1, and DSR (advanced front-end) Standards. These tests were made on the Aurora 2 database, with different signal-to-noise (SNRs) ratios. In the speech recognition tests, the proposed MFB VAD outperformed all the three VAD algorithms used in the standards by 14.19% relative (G.723.1 VAD), by 12.84% relative (G.729 VAD), and by 4.17% relative (DSR VAD) in all SNRs
This paper presents and solves the problems of modeling and designing the necessary capacity and the adequate capacity of accepting input connections serving the calls into the system. The main aim of the research work is finding the optimum number of input server connections while minimizing the number of rejected requests according to a specific maximum number of expected calls in a specific time-interval, i.e. at peak-hour. With the results obtained we wish to model and optimize the planning and the dimensioning of the processing server as well as reduce the costs of this, since hiring an input line actually presents quite a substantial cost. Therefore it is necessary to first determine how many input connections are needed to serve a certain quota of users at a specific moment by using the methods of statistical modeling. On the basis of obtained results we can then assess whether a certain segment has too many or not enough input connections. The objective of the presented multiple-input and multiple-output (MIMO) simulator is to raise the level and the quality of service and at the same time lower the costs of hiring input connections. This paper presents the key segments composing the call and server system (ordinary, lamer and dummy caller model, statistical Gaussian curve of calls distribution, mechanisms of accepting and rejecting calls, management of input connections capacity, random call triggering, etc.). The above-mentioned segments represent the models and the sub-models of the simulator. They have been derived using the methods of statistical modeling. The optimum solution can be found manually or automatically using the method of automation of simulation runs and incrementing/decrementing the parameter of the number of input connections into the system. Searching the optimum number of input connections manually is an entirely empirical method, where the user manually changes the mentioned parameter, and is looking for a scenario in which the result of the simulator regarding the number of rejected calls is minimal. With an automatic search the simulator automatically generates the number of runs with incrementing and decrementing the mentioned parameter in each, and thus automatically finds the optimum solution. This paper also presents an automatic analysis of simulation runs and a statistical final report, which includes a conclusion on the results obtained in different scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.