The study of brain network interactions during naturalistic stimuli facilitates a deeper understanding of human brain function. Intersubject correlation (ISC) analysis of functional magnetic resonance imaging (fMRI) data is a widely used method that can measure neural responses to naturalistic stimuli that are consistent across subjects. However, interdependent correlation values in ISC artificially inflated the degrees of freedom, which hinders the investigation of individual differences. Besides, the existing ISC model mainly focus on similarities between subjects but fails to distinguish neural responses to different stimuli features. To estimate large-scale brain networks evoked with naturalistic stimuli, we propose a novel analytic framework to characterize shared spatio-temporal patterns across subjects in a purely data-driven manner. In the framework, a third-order tensor is constructed from the timeseries extracted from all brain regions from a given parcellation, for all participants, with modes of the tensor corresponding to spatial distribution, time series and participants. Tensor component analysis (TCA) will then reveal spatially and temporally shared components, i.e., naturalistic stimuli evoked networks, their temporal courses of activity and subject loadings of each component. To enhance the reproducibility of the estimation with TCA, a novel spectral clustering method, tensor spectral clustering, was proposed and applied to evaluate the stability of TCA algorithm. We demonstrate the effectiveness of the proposed framework via simulations and real fMRI data collected during a motor task with a traditional fMRI study design. We also apply the proposed framework to fMRI data collected during passive movie watching to illustrate how reproducible brain networks are identified evoked by naturalistic movie viewing.