Compared with acoustic microphone (AM) speech, boneconducted microphone (BCM) speech is much immune to background noise, but suffers from severe loss of information due to the characteristics of the human-body transmission channel. In this letter, a new method for the speaker-dependent BCM speech enhancement is proposed, in which we focus our attention on the spectra restoration of the distorted speech. In order to better infer the missing components, an attention-based bidirectional Long Short-Term Memory (AB-BLSTM) is designed to optimize the use of contextual information to model the relationship between the spectra of BCM speech and its corresponding clean AM speech. Meanwhile, a structural error metric, Structural SIMilarity (SSIM) metric, originated from image processing is proposed to be the loss function, which provides the constraint of the spectro-temporal structures in recovering of the spectra. Experiments demonstrate that compared with approaches based on conventional DNN and mean square error (MSE), the proposed method can better recover the missing phonemes and obtain spectra with spectro-temporal structure more similar to the target one, which leads to great improvement on objective metrics.