Multivariate time series anomaly detection is a crucial technology to prevent unexpected errors from causing critical impacts. Effective anomaly detection in such data requires accurately capturing temporal patterns and ensuring the availability of adequate data. This study proposes a patch-wise framework for anomaly detection. The proposed approach comprises four key components: (i) maintaining continuous features through patching, (ii) incorporating various temporal information by learning channel dependencies and adding relative positional bias, (iii) achieving feature representation learning through self-supervised learning, and (iv) supervised learning based on anomaly augmentation for downstream tasks. The proposed method demonstrates strong anomaly detection performance by leveraging patching to maintain temporal continuity while effectively learning data representations and handling downstream tasks. Additionally, it mitigates the issue of insufficient anomaly data by supporting the learning of diverse types of anomalies. The experimental results show that our model achieved a 23% to 205% improvement in the F1 score compared to existing methods on datasets such as MSL, which has a relatively small amount of training data. Furthermore, the model also delivered a competitive performance on the SMAP dataset. By systematically learning both local and global dependencies, the proposed method strikes an effective balance between feature representation and anomaly detection accuracy, making it a valuable tool for real-world multivariate time series applications.