Background: The social environment, including social support, social burden, and quality of interactions, influences a range of health outcomes, including mental health. Passive audio data collection on mobile phones (e.g., episodic recording of the auditory environment without requiring any active input from the phone user) enables new opportunities to understand the social environment. We evaluated the use of passive audio collection on mobile phones as a window onto the relationship between the social environment within a study of mental health among adolescent mothers in Nepal.Methods: We enrolled 23 adolescent mothers who first participated in qualitative interviews to describe their social support and identify sounds potentially associated with that support. Then episodic recordings were collected for two weeks from the same women using an app to capture 30 seconds of audio every 15 minutes from 4am to 9pm. Audio data were processed and classified using a pretrained model. Each classification category was accompanied by a predicted accuracy score. Manual validation of the machine-predicted speech and non-speech categories (10%) was done for accuracy.Results: In qualitative interviews, mothers described a range of positive and negative social interactions and the sounds that accompanied these. Potential positive sounds included adult speech and laughter, baby babbling and laughter, and sounds from baby toys. Sounds characterizing negative stimuli included yelling, crying, screaming by adults and crying by babies. Sounds associated with social isolation included silence and TV or radio noises. Speech comprised of 43% of all passively recorded audio clips (n=7725). Manual validation showed a 23% false positive rate and 62% false-negative rate for speech, demonstrating potential underestimation of speech exposure. Other common sounds included music and vehicular noises.Conclusions: Passively capturing audio has the potential to improve understanding of the social environment. However, the limited accuracy of the pre-trained model used in this study did not adequately distinguish between positive and negative social interactions. To improve the contribution of passive audio collection to understanding the social environment, future work should improve the accuracy of audio categorization, code for constellations of sounds, and combine audio with other smartphone data collection such as location and activity.