Background: Given the role of childhood aggressive behavior (AGG) in everyday child development, precise and accurate measurement is critical in clinical practice and research. This study aims to quantify agreement among widely used measures of childhood AGG regarding item content, clinical concordance, correlation, and underlying genetic construct. Methods: We analyzed data from 1254 Dutch twin pairs (age 8-10 years, 51.1% boys) from a general population sample for whom both parents completed the A-TAC, CBCL, and SDQ at the same occasion. Results: There was substantial variation in item content among AGG measures, ranging from .00 (i.e., mutually exclusive) to .50 (moderate agreement). Clinical concordance (i.e., do the same children score above a clinical threshold among AGG measures) was very weak to moderate with estimates ranging between .01 and .43 for motherreports and between .12 and .42 for father-reports. Correlations among scales were weak to strong, ranging from .32 to .70 for mother-reports and from .32 to .64 for father-reports. We found weak to very strong genetic correlations among the measures, with estimates between .65 and .84 for mother-reports and between .30 and .87 for fatherreports. Conclusions: Our results demonstrated that degree of agreement between measures of AGG depends on the type (i.e., item content, clinical concordance, correlation, genetic correlation) of agreement considered. Because agreement was higher for correlations compared to clinical concordance (i.e., above or below a clinical cutoff), we propose the use of continuous scores to assess AGG, especially for combining data with different measures. Although item content can be different and agreement among observed measures may not be high, the genetic correlations indicate that the underlying genetic liability for childhood AGG is consistent across measures.