Acoustic data has long been harvested in fundamental voice investigations since it is easily obtained using a microphone. However, acoustic signals alone do not reveal much about the complex interplay between sound waves, structural surface waves, mechanical vibrations, and fluid flow involved in phonation. Available high speed imaging techniques have over the past ten years provided a wealth of information about the mechanical deformation of the superior surface of the larynx during phonation. Time-resolved images of the inner structure of the deformable soft tissues are not yet feasible because of low temporal resolution (MRI and ultrasound) and x-ray dose-related hazards (CT and standard xray). One possible approach to circumvent these challenges is to use mathematical models that reproduce observable behavior such as phonation frequency, closed quotient, onset pressure, jitter, shimmer, radiated sound pressure, and airflow. Mathematical models of phonation range in complexity from systems with relatively small degrees of freedom (multi-mass models) to models based on partial differential equations (PDEs) mostly solved by finite element (FE) methods resulting in millions of degrees-of-freedom. We will provide an overview about the current state of mathematical models for the human phonation process, since they have served as valuable tools for providing insight into the basic mechanisms of phonation and may eventually be of sufficient detail and accuracy to allow surgical planning, diagnostics, and rehabilitation evaluations on an individual basis. Furthermore, we will also critically discuss these models w.r.t. the used geometry, boundary conditions, material properties, their verification, and reproducibility.