Anomaly-based intrusion detection system (AIDS) plays an increasingly important role in detecting complex, multi-stage network attacks, especially zero-day attacks. Although there have been improvements both in practical applications and the research environment, there are still many unresolved accuracy-related concerns. The two fundamental limitations that contribute to these concerns are: i) the succinct, concise, latent representation learning of the normal network data, and ii) the optimization volume of normal regions in latent space. Recent studies have suggested many ways to learn the latent representation of normal network data in a semi-supervised manner to construct AIDS. However, these approaches are still affected by the above limitations, mainly due to the inability to process high data dimensionality or ineffectively explore the underlying architecture of the data. In this paper, we propose a novel Deep Nested Clustering Auto-Encoder (DNCAE) model to thoroughly overcome the aforementioned difficulties and improve the performance of network attack detection. The proposed model consists of two nested Deep Auto-Encoders (DAE) to learn the informative and tighter data representation space. In addition, the DNCAE model integrates the clustering technique into the latent layer of the outer DAE to learn the optimal arrangement of data points in the latent space. This harmonious combination allows us to effectively deal with the limitations outlined. The performance of the proposed model is evaluated using standard datasets including NSL-KDD, UNSW-NB15, and six scenarios of CIC-IDS2017 (Tuesday, Wednesday, Thursday-Morning, Friday-Morning, Friday-Afternoon-PortScan, Friday-Afternoon DDoS). The experimental results strongly confirm that the proposed model clearly outperforms the baselines and the existing methods for network anomaly detection.