The prevalent method in syntax and semantics research involves obtaining a judgment of the acceptability of a sentence / meaning pair, typically by just the author of the paper, sometimes with feedback from colleagues. This methodology does not allow proper testing of scientific hypotheses because of (a) the small number of experimental participants (typically one); (b) the small number of experimental stimuli (typically one); (c) cognitive biases on the part of the researcher and participants; and (d) the effect of the preceding context (e.g., other constructions the researcher may have been recently considering). In the current paper we respond to some arguments that have been given in support of continuing to use the traditional non-quantitative method in syntax / semantics research. One recent defense of the traditional method comes from Phillips (2008), who argues that no harm has come from the non-quantitative approach in syntax research thus far. Phillips argues that there are no cases in the literature where an incorrect intuitive judgment has become the basis for a widely accepted generalization or an important theoretical claim. He therefore concludes that there is no reason to adopt more rigorous data collection standards. We challenge Philips' conclusion by presenting three cases from the literature where a faulty intuition has led to incorrect generalizations and mistaken theorizing, plausibly due to cognitive biases on the part of the researchers.Furthermore, we present additional arguments for rigorous data collection standards. For example, allowing lax data collection standards has the undesirable effect that the results and claims will often be ignored by researchers with stronger methodological standards.Finally, we observe that behavioral experiments are easier to conduct in English than ever before, with the advent of Amazon.com's Mechanical Turk, a marketplace interface that Quantitative syntax 3 can be used for collecting behavioral data over the internet.Quantitative syntax 4