Various semi-formal syntax templates for natural language requirements foster to reduce ambiguity while preserving human readability. Existing studies on their effectiveness focus on individual notations only and do not allow to systematically investigate quality benefits. We strive for a comparative benchmark and evaluation of template systems to assist practitioners in selecting appropriate ones and enable researchers to work on pinpoint improvements and domain-specific adaptions. We conduct comparative experiments with five popular template systems—EARS, Adv-EARS, Boilerplates, MASTeR, and SPIDER. First, we compare a control group of free-text requirements and treatment groups of their variants following the different templates. Second, we compare MASTeR and EARS in user experiments for reading and writing. Third, we analyse all five meta-models’ formality and ontological expressiveness based on the Bunge-Wand-Weber reference ontology. The comparison of the requirement phrasings across seven relevant quality characteristics and a dataset of 1764 requirements indicates that, except SPIDER, all template systems have positive effects on all characteristics. In a user experiment with 43 participants, mostly students, we learned that templates are a method that requires substantial prior training and that profound domain knowledge and experience is necessary to understand and write requirements in general. The evaluation of templates systems’ meta-models suggests different levels of formality, modularity, and expressiveness. MASTeR and Boilerplates provide high numbers of variants to express requirements and achieve the best results with respect to completeness. Templates can generally improve various quality factors compared to free text. Although MASTeR leads the field, there is no conclusive favourite choice, as most effect sizes are relatively similar.