Multi-frequency-modulated visual stimulation scheme has been shown effective for the steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs) recently, especially in increasing the visual target number with less stimulus frequencies and mitigating the visual fatigue. However, the existing calibration-free recognition algorithms based on the traditional canonical correlation analysis (CCA) cannot provide the merited performance. Approach: To improve the recognition performance, this study proposes a phase difference constrained CCA (pdCCA), which assumes that the multi-frequency-modulated SSVEPs share a common spatial filter over different frequencies and have a specified phase difference. Specifically, during the CCA computation, the phase differences of the spatially filtered SSVEPs are constrained using the temporal concatenation of the sine-cosine reference signals with the pre-defined initial phases. Main results: We evaluate the performance of the proposed pdCCA-based method on three representative multi-frequency-modulated visual stimulation paradigms (i.e., based on the multi-frequency sequential coding, the dual-frequency, and the amplitude modulation). The evaluation results on four SSVEP datasets (Dataset Ia, Ib, II, and III) show that the pdCCA-based method can significantly outperform the current CCA method in terms of recognition accuracy. It improves the accuracy by 22.09% in Dataset Ia, 20.86% in Dataset Ib, 8.61% in Dataset II, and 25.85% in Dataset III. Significance: The pdCCA-based method, which actively controls the phase difference of the multi-frequency-modulated SSVEPs after spatial filtering, is a new calibration-free method for multi-frequency-modulated SSVEP-based BCIs.