Face detection and landmark localization are usually the fundamental and important steps of facial analysis. The accuracy degradation of any task has impacts on the accuracy and robustness of the downstream analysis. In most facial analysis systems, face detection and landmark detection, as two independent tasks, are predicted sequentially with single-task detectors respectively, which suffer from the higher complexity and poorer consistency. In contrast, multi-task joint face and landmark detectors are more efficient and have more shared features. However, when two tasks are predicted in one model, it is difficult to improve the accuracy of all tasks simultaneously, due to the different requirements of the different targets to be detected. In our opinion, the multi-task detection model, integrating the perception ability of both scale variation and detailed texture information for face and landmark detection, is arduous but important in unconstrained settings. In this paper, a High-Performance joint Face and Landmark detector(HP-FLDet) is proposed, which can predict facial bounding boxes and landmarks in single stage and improve the accuracy of both face detection and landmark localization simultaneously. In this detector, the context inception block and scale-aware multi-task strategies are designed to resolve extreme scale variance and improve the perception ability of detailed texture information. It demonstrates superior performance on both face and landmark detection datasets including AFW, PASCAL Face, FDDB, Wider Face and AFLW. Especially, it achieves SOTA results on the most challenging WIDER FACE dataset with 0.970(Easy), 0.963(Medium), 0.921(Hard) mAP on the test subset, which outperforms all state-of-the-art rivals.