Division is generally regarded as a low-frequency, high-latency operation in integer operations. Division is also the operation that stalls the processor pipeline most frequently. In order to improve the overall performance of embedded processors, a low-delay divider for embedded processors was designed. Based on the non-restoring algorithm, the divider uses a compound adder to execute addition and subtraction simultaneously and reduces the iteration path delay. By shifting the operands to align the most effective bits, the divider dynamically adjusts the number of iteration cycles to reduce the average number of cycles in the division process. The divider design was simulated by Modelsim and implemented on a FPGA board for verification. Synthesized in a Semiconductor Manufacturing International Corporation (SMIC) 65 nm Low Leakage process, the achieved frequency of the design was up to 500 MHz and the area cost was 5670.36 μm2. Compared with other dividers, the proposed divider design can reduce the delay of single iteration by up to 45.3%, save the average number of iteration cycles by 20–50%, and save the area by 23.3–86.1%. Compared with other dividers implemented on FPGA, it saves LUTs by 36.47–59.6% and FFs by 67–84.28%, runs 2–6.36 times faster. Therefore, the proposed design is suitable for embedded processors that require low power consumption, low resource consumption, and high performance.