Regular Expression Matching (REM) is the core of Deep Packet Inspection (DPI), which is important for various network security applications. The burgeoning Software Defined Network and Network Function Virtualization technologies make the network evolve more dynamic, which brings serious challenges for DPI engines to achieve high matching performance with fast rule-set update capability. To meet these challenges, this paper proposes a heterogeneous Field Programmable Gate Array (FPGA)-Central Processing Unit (CPU) architecture to accelerate Deterministic Finite Automaton (DFA)-based REM with high preprocessing performance. Firstly, a novel regex decomposition technique is proposed to solve the DFA state explosion problem, which splits each regex into one prefix and several postfixes. Secondly, heterogeneous architecture is presented to collaboratively handle regex matching, in which prefixes are matched in parallel in an FPGA and postfixes are matched in a CPU. To further improve the matching performance, several well-designed DFA compression techniques and regex decomposition optimizations are proposed. Our design has been implemented in a DPI prototype employing a medium-end FPGA. Extensive experiments are conducted to evaluate the performance. Results reveal that our proposed architecture achieves 6.33 Gbps matching throughput on the Snort rule-set (v3.0), which is close to state-of-the-art FPGA NFA-based schemes. However, the rule-set preprocessing time is significantly reduced to <7 minutes, compared with up to several hours of FPGA NFA-based countermeasures.