Pipelined SRAM-based algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines as well as across different stages of each pipeline must be balanced. Second, the traffic among these pipelines should be balanced. Third, the intra-flow packet order should be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in a pipeline. To balance the traffic, we propose a flow pre-caching scheme to exploit the inherent caching in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a payload exchange scheme exploiting the pipeline delay is used to maintain the intra-flow packet order. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second (GPPS) while preserving intra-flow packet order.