In deep sub-micron (DSM) technology, wires are equally or more important than logic components since wire-related problems such as crosstalk, noise are much critical in system-on-chip (SoC) design. Recently, a method [12] for generating a partial product reduction tree (PPRT) with optimal-timing using bit-level adders to implement arithmetic circuits, which outperforms the current best designs, is proposed. However, in the conventional approaches including [12], interconnects are not primary components to be optimized in the synthesis of arithmetic circuits, mainly due to its high integration complexity or unpredictable wire effects, thereby resulting in unsatisfactory layout results with long and messed wire connections. To overcome the limitation, we propose a new module generation/synthesis algorithm for arithmetic circuits utilizing carry-save-adder (CSA) modules, which not only optimizes the circuit timing but also generates a much regular interconnect topology of the final circuits. Specifically, we propose a two-step algorithm: (Phase 1: CSA module generation) we propose an optimal-timing CSA module generation algorithm for an arithmetic expression under a general CSA timing model; (Phase 2: Bit-level interconnect refinements) we optimally refine the interconnects between the CSA modules while retaining the global CSA-tree structure produced by Phase 1. It is shown that the timing of the circuits produced by our approach is equal or almost close to that by [12] in most testcases (even without including the interconnect delay), and at the same time, the interconnects in layout are significantly short and regular.