Abstract-Audio quality in the Internet can be strongly affected by network conditions. As a consequence, many techniques to evaluate it have been developed. In particular, the ITU-T adopted in 2001 a technique called Perceptual Evaluation of Speech Quality (PESQ) to automatically measuring speech quality. PESQ is a well-known and widely used procedure, providing in general an accurate evaluation of perceptual quality by comparing the original and received voice sequences.One obvious inherent limitation of PESQ is, thus, that it requires the original signal (we say the reference), to make its evaluation. This precludes the use of PESQ for assessing the perceived quality in real-time, as the reference is in general not available.In this paper, we describe a procedure for estimating PESQ output working only with measures taken on the network state and properties of the communication system, without any use of the reference. It is based on the use of statistical learning techniques. Specifically, we rely on recent ideas for learning with specific types of neural networks, known under the name of Echo State Networks (ESNs), a member of the class of Reservoir Computing systems. These tools have been proven to be very efficient and robust in many learning tasks. The experimental results obtained show the good accuracy of the resulting procedure, and its capability to give its estimations of speech quality in a real-time context. This allows putting our measuring modules in future Internet applications or services based on voice transmission, for instance for control purposes.Index Terms-Quality assessment, speech quality, echo state networks, reservoir computing.
I. INTRODUCTIONMeasuring the quality of a voice signal transmitted over the Internet is an important topic today, and one of main available tools for this purpose is the Perceptual Evaluation of Speech Quality (PESQ) method accepted in 2001 as the ITU-T objective speech quality measurement standard P.862 [1]. The network conditions vary over time, and in many contexts, several different factors lead to losses, which in turn lead to degradations in the perception of the quality by the users. PESQ analyzes this quality by comparing the received signal with the original speech sequence. For this reason, we say that it is a "full reference" technique, the reference being the original signal. Researchers in many areas use PESQ and the tool has been widely diffused in commercial measurement products. Recently, the ITU started to update its voice Manuscript received March 1, 2013; revised April 15, 2013. This work was supported in part by the European Celtic Project "QuEEN".S. Basterrech is with the University of Rennes 2, Rennes, France (e-mail: Sebastian.Basterrechtiscordio@etudiant.uhb.fr).G. Rubino is with the National Institute for Research in Computer Science and Control (INRIA Rennes -Bretagne Atlantique), Rennes, France (e-mail: Gerardo.Rubino@inria.fr). assessment recommendations by promoting the new P.863 standard Perceptual Objective Listening Quality Asse...