Objective. Spatial filtering has proved to be a powerful pre-processing step in detection of steady-state visual evoked potentials and boosted typical detection rates both in offline analysis and online SSVEP-based brain–computer interface applications. State-of-the-art detection methods and the spatial filters used thereby share many common foundations as they all build upon the second order statistics of the acquired Electroencephalographic (EEG) data, that is, its spatial autocovariance and cross-covariance with what is assumed to be a pure SSVEP response. The present study aims at highlighting the similarities and differences between these methods. Approach. We consider the canonical correlation analysis (CCA) method as a basis for the theoretical and empirical (with real EEG data) analysis of the state-of-the-art detection methods and the spatial filters used thereby. We build upon the findings of this analysis and prior research and propose a new detection method (CVARS) that combines the power of the canonical variates and that of the autoregressive spectral analysis in estimating the signal and noise power levels. Main results. We found that the multivariate synchronization index method and the maximum contrast combination method are variations of the CCA method. All three methods were found to provide relatively unreliable detections in low signal-to-noise ratio (SNR) regimes. CVARS and the minimum energy combination methods were found to provide better estimates for different SNR levels. Significance. Our theoretical and empirical results demonstrate that the proposed CVARS method outperforms other state-of-the-art detection methods when used in an unsupervised fashion. Furthermore, when used in a supervised fashion, a linear classifier learned from a short training session is able to estimate the hidden user intention, including the idle state (when the user is not attending to any stimulus), rapidly, accurately and reliably.