Total laryngectomy (TL) stands as a well-established treatment for advanced laryngeal malignancies, entailing the complete removal of the larynx. Speech rehabilitation following TL is crucial for improving the quality of life (QoL) and facilitating social reintegration. Electrolaryngeal (EL) speech, a widely-used voice restoration technique utilizing external excitation signals, often produces artificial and monotonous sound quality despite enabling patients to form lengthy sentences. Efforts to enhance EL speech include the application of statistical voice conversion (VC) and neural approaches to speech enhancement. These approaches typically aim to map spectral features into acoustic characteristics, including the fundamental frequency (F 0 ). However, challenges arise due to substantial discrepancies and pattern changes between extracted features for EL and normal speech, compounded by limited clinical training data. To address this issue, we explored F 0 pattern prediction based on frame-wise phoneme information using bidirectional long short-term memory (BiLSTM) recurrent neural networks. Beyond direct predictions based on phoneme labels, we expanded our analysis to include real-valued phoneme embeddings, and conducted predictions for clustered embeddings representing lower-dimensional input representations. Our findings demonstrate that both regression and classification predictive modeling can map frame-wise phoneme information into natural F 0 patterns. Additionally, phoneme labels can be considered as shared features between EL and normal speech, allowing for improved prediction accuracies by incorporating phoneme information from normal speech into the training sets for EL speech. Furthermore, by learning of phoneme embeddings and creating input features for F 0 prediction based on the clustering of these embeddings, accurate F 0 patterns can be predicted, and the challenge of finding a strategy to reduce the dimensionality of the input features can be effectively alleviated.INDEX TERMS Electrolaryngeal speech, fundamental frequency prediction, phoneme labels, phoneme embeddings, speech enhancement.Following TL, the pharynx is decoupled from the trachea, and inhalation and exhalation occur through an opening in