Leakages in the underground water distribution networks (WDNs) waste over 1 billion gallon of water annually in the US and cause significant socio-economic loss to our communities. However, detecting and localization leakage in a WDN remains a challenging technical problem despite of significant progresses in this domain. The progresses in machine learning (ML) provides new ways to identify the leakage by data-driven methods. However, in-service WDNs are short of labeled data under leaking conditions, which makes it infeasible to use common ML models. This study proposed a novel machine learning (ML)-based framework for WDN leak detection and localization. This new framework, named clustering-then-localization semi-supervised learning (CtL-SSL), uses the topological relationship of WDN and its leakage characteristics for WDN partition and sensors placement, and subsequently utilizes the monitoring data for leakage detection and leakage localization. The CtL-SSL framework is applied to two testbed WDNs and achieves 95% leakage detection accuracy and around 83% final leakage localization accuracy by use of unbalanced data with less than 10% leaking data. The developed CtL-SSL framework advances the leak detection strategy by alleviating the data requirements, guiding optimal sensor placement, and locating leakage via WDN leakage zone partition. It features excellent scalability, extensibility, and upgradeability for applications to various types of WDNs. It will provide valuable a tool in sustainable management of the WDNs.