Window detection is a key component in many graphics and vision applications related to 3D city modeling and scene visualization. We present a novel approach for learning to recognize windows in a colored facade image. Rather than predicting bounding boxes or performing facade segmentation, our system locates keypoints of windows, and learns keypoint relationships to group them together into windows. A further module provides extra recognizable information at the window center. Locations and relationships of keypoints are encoded in different types of heatmaps, which are learned in an end-to-end network. We have also constructed a facade dataset with 3 418 annotated images to facilitate research in this field. It has richly varying facade structures, occlusion, lighting conditions, and angle of view. On our dataset, our method achieves precision of 91.4% and recall of 91.0% under 50% IoU (intersection over union). We also make a quantitative comparison with state-of-the-art methods to verify the utility of our proposed method. Applications based on our window detector are also demonstrated, such as window blending.