Generic object detection is a crucial task for autonomous driving. To devise a safe and efficient object detector, the following aspects are required to be considered: high accuracy, real‐time inference speed and small model size. Herein, a simple yet effective anchor‐free object detector named L4Net is proposed, which incorporates a keypoint detection backbone and a co‐attention scheme into a unified framework, and achieves lower computation cost with higher detection accuracy than prior art across a wide spectrum of resource constrains. Specifically, the backbone utilizes Multi‐scale Receptive‐fields Enhancement module (MRE) to capture context‐wise information, where the features of object scale and shape invariance are simultaneously considered. The co‐attention scheme integrates the strength of both Class‐agnostic Attention (CA) and Semantic Attention (SA), and explores the valuable features from low‐level to high‐level to generate more accurate prediction boxes. Compared with previous feature fusion strategy, multi‐scale features are selectively integrated by fully exploiting the different characteristics of low‐level and high‐level features, which leads to a small model size and faster inference speed. Extensive experiments on four well‐known datasets demonstrate the effectiveness of our method. For instance, L4Net achieves 71.68% mAP on KITTI test set, with 13.7 M model size at the speed of 149 FPS on NVIDIA TX and 30.7 FPS on Qualcomm‐based device, respectively, which is 4x smaller and 2x faster than baseline model.