As urbanization accelerates, parks, as vital urban public open spaces, and their acoustic and thermal ambience directly impact visitors’ comfort and the sustainability of parks. Selecting Xihu Park in Fuzhou, China located in the subtropical region as a typical example, this study utilizes covert observational experiments with different typical sounds (grass cutting, music, and no sound source) across temperature levels to examine the influence of thermal–acoustic interactions on crowd behaviors in the park. The findings are as follows: (1) melodious music can attract more tourists, while strong stimulating grass cutting noises under high temperatures reduce crowd flow. Excluding unpleasant audio sources, park soundscapes across temperatures have a relatively limited influence on attractiveness to people flow. (2) High temperatures diminish tourists’ interest in landscape experiences and persons staying, especially when the soundscape quality is poorer. Under non-high temperatures, audio environments have a minor impact on the staying time. (3) The soundscape quality plays a role by affecting people’s path choices of approaching or avoiding sound sources, where grass cutting noise has the most negative influence. Music, grass cutting sounds, and natural sounds demonstrate conspicuous differences in their effects under varied temperatures. (4) Comfortable acoustic environments can draw larger crowds and decrease the walking pace. High temperatures make crowds take slower steps. Different sound types have significant influences on crowd movement velocity under three typical temperature levels. This study comprehensively investigates the mechanisms of typical thermal–acoustic environments’ impacts on park crowd behaviors, providing important references for optimizing the acoustic and thermal environments of urban parks, while also enriching related research on landscape design and environmental psychology. Future studies can conduct in-depth explorations by creating more abundant thermal–acoustic combinations and probe differences across diverse populations.