Judgment tasks (JTs, often called acceptability or grammaticality judgment tasks) are found extensively throughout the history of second language (L2) research (Chaudron, 1983). Data from such instruments have been used to investigate a range of hypotheses and phenomena, from generativist theories to instructional effectiveness. Though popular and convenient, JTs have engendered considerable controversy, with concerns often centered on their construct validity in terms of the type of representations they elicit, such as implicit or explicit knowledge (Ellis, 2005; Vafaee et al., 2016). A number of studies have also examined the impact of JT conditions such as timed vs. untimed, oral vs. written (e.g., Murphy, 1997; Spada et al., 2015). This paper presents a synthesis of the use of JTs and a meta-analysis of the effects of task conditions on learner performance. Following a comprehensive search, 385 JTs were found in 302 individual studies. Each report was coded for features related to study design as well as methodological, procedural, and psychometric properties of the JTs. These data were synthesized in order to understand how this type of instrument has been implemented and reported. In addition to observing a steady increase in the use of JTs over the last four decades, we also found many of the features of JTs, when reported, varied substantially across studies. In terms of the impact of JT design, whereas modality was not found to have a strong or stable effect on learner performance (median d=.14; IQR=1.04), scores on untimed JTs tended to be substantially higher than when timed (d=1.35; IQR=1.74). In examining these features and their links to findings, this paper builds on a growing body of methodological syntheses of L2 research instrumentation (e.g., Derrick, 2016; Marsden et al., in press) and makes a number of empirically grounded recommendations for future studies involving JTs.