We present improved distributed algorithms for triangle detection and its variants in the CONGEST model. We show that Triangle Detection, Counting, and Enumeration can be solved inÕ(n 1/2 ) rounds. In contrast, the previous state-of-the-art bounds for Triangle Detection and Enumeration wereÕ(n 2/3 ) andÕ(n 3/4 ), respectively, due to Izumi and LeGall (PODC 2017).The main technical novelty in this work is a distributed graph partitioning algorithm. We show that inÕ(n 1−δ ) rounds we can partition the edge set of the network G• Each connected component induced by E m has minimum degree Ω(n δ ) and conductance Ω(1/polylog(n)). As a consequence the mixing time of a random walk within the component is O(polylog(n)).• The subgraph induced by E s has arboricity at most n δ .• |E r | ≤ |E|/6.All of our algorithms are based on the following generic framework, which we believe is of interest beyond this work. Roughly, we deal with the set E s by an algorithm that is efficient for low-arboricity graphs, and deal with the set E r using recursive calls. For each connected component induced by E m , we are able to simulate CONGESTED-CLIQUE algorithms with small overhead by applying a routing algorithm due to Ghaffari, Kuhn, and Su (PODC 2017) for high conductance graphs.We consider Triangle Detection problems in distributed networks. In the LOCAL model [34], which has no limit on bandwidth, all variants of Triangle Detection can be solved in exactly one round of communication: every vertex v simply announces its neighborhood N (v) to all neighbors. However, in models that take bandwidth into account, e.g., CONGEST, Triangle Detection becomes significantly more complicated. Whereas many graph optimization problems studied in the CONGEST model are intrinsically "global" (i.e., require at least diameter time) [2,11,12,14,15,20,26], Triangle Detection is somewhat unusual in that it can, in principle, be solved using only locally available information.The CONGEST Model. The underlying distributed network is represented as an undirected graph G = (V, E), where each vertex corresponds to a computational device, and each edge corresponds to a bi-directional communication link. We assume each v ∈ V initially knows some global parameters such as n = |V |, ∆ = max v∈V deg(v), and D = diameter(G). Each vertex v has a distinct Θ(log n)-bit identifier ID(v). The computation proceeds according to synchronized rounds.In each round, each vertex v can perform unlimited local computation, and may send a distinct O(log n)-bit message to each of its neighbors. Throughout the paper we only consider the randomized variant of CONGEST. Each vertex is allowed to generate unlimited local random bits, but there is no global randomness.The Congested Clique Model. The CONGESTED-CLIQUE model is a variant of CONGEST that allows all-to-all communication. Each vertex initially knows its adjacent edges and the set of vertex IDs, which we can assume w.l.o.g. is {1, . . . , |V |}. In each round, each vertex transmits n − 1 O(log n)-bit messages, one addressed to ...