Multi-party computing (MPC) has been gaining popularity over the past years as a secure computing model, particularly for machine learning (ML) inference. Compared with its competitors, MPC has fewer overheads than homomorphic encryption (HE) and has a more robust threat model than hardware-based trusted execution environments (TEE) such as Intel SGX. Despite its apparent advantages, MPC protocols still pay substantial performance penalties compared to plaintext when applied to ML algorithms. The overhead is due to added computation and communication costs. For multiplications which are ubiquitous in ML algorithms, MPC protocols add 32x more computational costs and 1 round of broadcasting among MPC servers. Moreover, ML computations that have trivial costs in plaintext, such as Softmax, ReLU and other non-linear operations become very expensive due to added communication. Those added overheads make MPC less palatable to deploy in realtime ML inference frameworks, such as speech translation.In our studies, we found that most MPC protocols today perform communications and computations in sequential manner. This serialization is not a poor implementation choice, but a requirement for MPC to work correctly. Without the data communication the parties cannot progress to the next computation step. Thus GPU servers that are parties in an MPC setting are idle when waiting for data transmission to complete. During communication phase, GPU utilization is low. This phenomenon inspires us to enable MPC servers to perform computations and communications concurrently through a series of novel MPCabiding computation transformation. In this work we present MPC-Pipe, an MPC pipeline inference technique that uses two ML-specific approaches. 1) inter-linear-layer pipeline and 2) inner layer pipeline. The first scheme benefits linear layers by transmitting input-independent MPC metadata beforehand, and the second benefits non-linear layers by breaking big inputs into smaller ones to overlap communications and computations. Those two techniques combined shorten the total inference runtime for machine learning models. Our experiments have shown to reduce ML inference latency by up to 12.6% when model weights are private and 14.48% when model weights are public, compared to current MPC protocol implementations.