Detection and vectorization of windows from building fac ¸ades are important for building energy modeling, civil engineering, and architecture design. However, current applications still face the challenges of low accuracy and lack of automation.In this paper we propose a new two-steps workflow for window segmentation and vectorization from fac ¸ade images. First, we propose a cross field learning-based neural network architecture, which is augmented by a grid-based self-attention module for window segmentation from rectified fac ¸ade images, resulting in pixel-wise window blobs. Second, we propose a regression neural network augmented by Squeeze-and-Excitation (SE) attention blocks for window vectorization. The network takes the segmentation results together with the original fac ¸ade image as input, and directly outputs the position of window corners, resulting in vectorized window objects with improved accuracy. In order to validate the effectiveness of our method, experiments are carried out on four public fac ¸ades image datasets, with results usually yielding a higher accuracy for the final window prediction in comparison to baseline methods on four datasets in terms of IoU score, F1 score, and pixel accuracy.