Symptom measurement in psychiatric research increasingly uses digitized self-report inventories and is turning to crowdsourcing platforms for recruitment, e.g., Amazon Mechanical Turk (mTurk). The impact of digitizing pencil-and-paper inventories on the psychometric properties is underexplored. Against this background, numerous studies report high prevalence estimates of psychiatric symptoms in mTurk samples. Here we develop a framework to evaluate the online implementation of psychiatric symptom inventories relative to two domains, that is, the adherence to (i) validated scoring and (ii) standardized administration. We apply this new framework to the online use of the Patient Health Questionnaire (PHQ-9), Generalised Anxiety Disorder-7 (GAD-7), and Alcohol Use Disorder Identification Test (AUDIT). Our systematic review of the literature identified 36 implementations of these three inventories on mTurk across 27 publications. We also evaluated methodological approaches to enhance data quality, e.g., the use of bot detection and attention check items. Of the 36 implementations, 23 reported the applied diagnostic scoring criteria and only 18 reported the specified symptom timeframe. None of the 36 implementations reported adaptations made in their digitization of the inventories. While recent reports attribute higher rates of mood, anxiety, and alcohol use disorders on mTurk to data quality, our findings indicate that this inflation may also relate to the assessment methods. We provide recommendations to enhance both data quality and fidelity to validated administration and scoring methods.