Distributed-memory triangle enumerating has attracted considerable interests due to its potential capability to process huge graphs quickly. However, existing algorithms suffer from low speed due to high communication cost and load imbalance. To solve the problems, we propose LiteTE, a lightweight, communication-efficient triangle enumerating scheme. To reduce communication cost, LiteTE proposes several techniques, including a graph partitioning method to fully leverage the large memory of commodity servers and the high bandwidth of modern networks and a fast broadcast algorithm to effectively utilize the bidirectional bandwidth of cables and the aggregate bandwidth of clusters. To reduce load imbalance, LiteTE proposes three-level techniques, including a codesign technique of graph partitioning and partition-level load balance, a decentralized dynamic node-level load balance technique, and a chunk-based lock-free work-stealing technique, all of which are lightweight and incur no or hardly any communication cost. The experimental results show that LiteTE reduces communication cost and load imbalance considerably and achieves much better performance in metrics, such as setup time, runtime, scalability, and load balance than the state-of-the-art algorithms. On a small-scale cluster, LiteTE enumerates the 15 trillion triangles in a graph of 92 billion edges in 15 min, while other algorithms fail to complete. INDEX TERMS Triangle enumerating, triangle computation, graph processing, distributed computing, parallel processing.