Compute Unified Device Architecture (CUDA) was used to design and implement molecular dynamics (MD) simulations on graphics processing units (GPU). With an NVIDIA Tesla C870, a 20-60 fold speedup over that of one core of the Intel Xeon 5430 CPU was achieved, reaching up to 150 Gflops. MD simulation of cavity flow and particle-bubble interaction in liquid was implemented on multiple GPUs using a message passing interface (MPI). Up to 200 GPUs were tested on a special network topology, which achieves good scalability. The capability of GPU clusters for large-scale molecular dynamics simulation of meso-scale flow behavior was, therefore, uncovered.