Phonations into a tube with the distal end either in the air or submerged in water are used for voice therapy. This study explores the effective mechanisms of these therapy methods. The study applied a physical model complemented by calculations from a computational model, and the results were compared to those that have been reported for humans. The effects of tube phonation on vocal tract resonances and oral pressure variation were studied. The relationships of transglottic pressure Ptrans(t) variation in time vs. glottal area variation GA(t), were constructed. The physical model revealed that, for the phonation on [u:] vowel through a glass resonance tube ending in the air, the first formant frequency F1 decreased by 67%, from 315 Hz to 105 Hz, thus slightly above the fundamental frequency F0 that was set to 90-94 Hz. For phonation through the tube into water, F1 decreased by 91-92%, reaching 26-28 Hz, and the water bubbling frequency Fb≅19-24 Hz was just below F1. The relationships of Ptrans(t) vs. GA(t) clearly differentiate vowel phonation from both therapy methods, and show a physical background for voice therapy with tubes. It is shown that comparable results have been measured in humans during tube therapy. For the tube in air, F1 descends closer to F0, while for the tube in water, the frequency Fb occurs close to the acoustic-mechanical resonance of the human vocal tract. In both therapy methods, part of the airflow energy required for phonation is substituted by the acoustic energy utilizing the first acoustic resonance. Thus, less flow energy is needed for vocal fold vibration, which results in improved vocal efficiency. The effect can be stronger in water resistance therapy if the frequency Fb approaches the acoustic-mechanical resonance of the vocal tract, while simultaneously F0 is voluntarily changed close to F1.