In this paper, we continue our previous work on improving Bandwidth Extension (BWE) of narrowband speech. We have shown that including memory into the parametrization frontend (through delta features) results in higher highband certainty irrespective of feature type, with MFCCs exhibiting higher correlation, in general, between both bands, reaching twice that using LSFs. By incorporating memory into the frontend of a conventional LP-based BWE system, we were able to translate the higher correlation due to memory into BWE performance improvement. Using high-resolution inverse DCT, we also achieved high quality speech reconstruction from MFCCs, thus enabling MFCC-based BWE with improved performance compared to conventional static LP-based BWE. We continue this work by incorporating the superior correlation properties of frontend memory into our MFCC-based BWE system. Log-Spectral Distortion as well as the more perceptually-correlated Itakura-based measures show that incorporating memory into our MFCC-based BWE system results in BWE performance superior to that of our dynamic LP-based BWE system. Index Terms-Bandwidth extension, memory inclusion, highresolution IDCT, highband certainty, mutual information
BACKGROUNDIn traditional telephone networks, speech bandwidth is limited to the 0.3-3.4 kHz range. As a result, narrowband speech has sound quality inferior to its wideband counterpart and has reduced intelligibility especially for consonant sounds. Wideband speech reconstruction through Bandwidth Extension (BWE) attempts to regenerate the highband (3.4-7 kHz) signal lost during the filtering processes employed in traditional networks, thereby providing backward compatibility with existing networks. BWE is based on the assumption that narrowband speech correlates with the highband signal, and thus, given some a priori information about the nature of this correlation, the higher frequency speech content can be estimated given only the available narrow band. Most BWE schemes use either codebook mapping or statistical modelling to perform this estimation.Since BWE performance closely follows the correlation available between representations of the narrow and high frequency bands, the premise of our work has been to quantify this correlation for different speech representations in order to adopt those representations with the greatest potential for BWE performance improvement. In our previous work; first introduced in [1] and later extended in [2], we made use of the concept of highband certainty (certainty about the high band given the narrow band); defined in [3] as the ratio of Mutual Information (MI) between the two bands to the discrete entropy of the high band, in order to quantify the correlation between speech frequency bands. Through highband certainty, we investigated the effect of including memory into the frontend on the resulting correlation (by using delta features in addition to the conventional static features which make no use of the considerable temporal correlation properties of speech), as ...