To meet growing terabit link rates, highly parallel and scalable architectures are needed for IP lookup engines in next generation routers. This paper proposes an SRAM-based multi-pipeline architecture for multi-terabit rate IP lookup. The architecture consists of multiple bidirectional linear pipelines, where each pipeline stores part of a routing table.We address the challenges of realizing such a solution. Two mapping schemes with different granularity are proposed to balance the memory distribution over different pipelines as well as across different stages in each pipeline. Also, IP caching is adopted to facilitate processing multiple packets per clock cycle. Instead of using large reorder buffers and complex logic, a lightweight scheduler and several small output delay queues are developed to preserve the intra-flow packet order. Simulation experiments using real-life data show that the proposed 4-pipeline architecture can store a core routing table with over 200K unique routing prefixes in less than 2 MB of memory, and can achieve a high throughput of up to 18.75 billion packets per second (GPPS), i.e. 6 Tbps for minimum size (40 bytes) packets.