BackgroundThe Four-Dimensional Symptom Questionnaire (4DSQ) is a self-report questionnaire measuring distress, depression, anxiety and somatization with separate scales. The 4DSQ has extensively been validated in clinical samples, especially from primary care settings. Information about measurement properties and normative data in the general population was lacking. In a Dutch general population sample we examined the 4DSQ scales’ structure, the scales’ reliability and measurement invariance with respect to gender, age and education, the scales’ score distributions across demographic categories, and normative data.Methods4DSQ data were collected in a representative Dutch Internet panel. Confirmatory factor analysis was used to examine the scales’ structure. Reliability was examined by Cronbach’s alpha, and coefficients omega-total and omega-hierarchical. Differential item functioning (DIF) analysis was used to evaluate measurement invariance across gender, age and education.ResultsThe total response rate was 82.4 % (n = 5273/6399). The depression scale proved to be unidimensional. The other scales were best represented as bifactor models consisting of a large general factor and one or more smaller specific factors. The general factors accounted for more than 95 % of the reliable variance of the scales. Reliability was high (≥0.85) by all estimates. The distress-, depression- and anxiety scales were invariant across gender, age and education. The somatization scale demonstrated some lack of measurement invariance as a result of decreased thresholds for some of the items in young people (16–24 years) and increased thresholds in elderly people (65+ years). The somatization scale was invariant regarding gender and education. The 4DSQ scores varied significantly across demographic categories, but the explained variance was small (<6 %). Normative data were generated for gender and age categories. Approximately 17 % of the participants scored above average on de distress scale, whereas 12 % scored above average on de somatization scale. Percentages of people scoring high enough on depression or anxiety as to suspect the presence of depressive or anxiety disorder were 4.1 and 2.5 respectively.ConclusionsEvidence supports reliability and measurement invariance of the 4DSQ in the general Dutch population. The normative data provided in this study can be used to compare a subject’s 4DSQ scores with a general population reference group.Electronic supplementary materialThe online version of this article (doi:10.1186/s12955-016-0533-4) contains supplementary material, which is available to authorized users.