Creating music through sound synthesis is the most representative electronic music creation method, and electronic music is actually the result of sound synthesis technology. Today, the field of electronic music encompasses multiple areas such as recording, mixing, composing, and producing. It also has some advantages over traditional music composition. Voice is the most effective and direct way of communication between people. And with the explosive development of speech recognition technology, the recognition rate of speech recognition systems in the near field environment has been greatly improved. However, in practical applications, there is often a large amount of ambient noise. If these environmental noises are strong, it will seriously affect the quality, accuracy, and speed of music synthesis. This greatly reduces not only the sound quality and clarity of speech but also the speed of speech recognition. To solve these problems, this paper proposes a multisensor speech enhancement technique and implements a multisensor speech enhancement system. It also proposes an enhancement method based on speaker speech and microphone speech. In this paper, the low-frequency harmonic components of the bone conduction signal are used to replace the frequency points disturbed by wind noise to reduce the influence of wind noise on speech quality and intelligibility. The experimental results show that the PESQ and MOS scores of the improved algorithm in this paper are 1.65 and 3.67, respectively. Compared with the existing methods, it has a great improvement. This can effectively improve the voice quality of the music synthesizer and reduce background noise.