Stream Reassembly is an indispensable function of Deep Packet Inspection, which is an critical element of Network Intrusion System. However, since it need to heavily move packet payload from one block of memory to another block of memory, Stream Reassembly has a serious memory performance issue. In this paper, in order to improve the Stream Reassembly performance, a Stream Reassembly Card (SRC) is designed, which enables to manage and assemble streams through adding a level of buffer to adjust the sequence of packets by using the Multi-core NPU. Specifically, three optimistic techniques, namely Stream Table Dispatching, No-Locking Timeout, and Multi-channel Virtual Queue are introduced in SRC design. The experiments show that the reassembly can achieve more than 3 Gbps in terms of processing speed, triply outperforming over the traditional server based architecture.