The Language ENvironment Analysis system (LENA™) automatically analyzes the natural sound environments of children. Among other things, it estimates the amounts of adult words (AWC), child vocalizations (CV), conversational turns (CT), and electronic media (TV) that a child is exposed to. To assess LENA's reliability, we compared it to manual transcription. Specifically, we calculated the correlation and agreement between the LENA estimates and manual counts for 48 five-min audio samples. These samples were selected from eight day-long recordings of six Dutch-speaking children (ages 2-5). The correlations were strong for AWC, r = . 87, and CV, r = . 77, and comparatively low for CT, r = . 52, and TV, r = . 50. However, the agreement analysis revealed a constant bias in AWC counts, and proportional biases for CV and CT (i.e., the bias varied with the values for CV and CT). Agreement for detecting electronic media was poor. Moreover, the limits of agreement were wide for all four metrics. That is, the differences between LENA and the manual transcriptions for individual audio samples varied widely around the mean difference. This variation could indicate that LENA was affected by differences between the samples that did not equally affect the human transcribers. The disagreements and biases cast doubt on the comparability of LENA measurements across families and time, which is crucial for using LENA in research. Our sample is too small to conclude within which limits LENA's measurements are comparable, but it seems advisable to be cautious of factors that could systematically bias LENA's performance and thereby create confounds.