Abstract-Due to the widespread use and inherent complexity of floating-point addition, much effort has been devoted to its speedup via algorithmic and circuit techniques. We propose a new redundant-digit representation for floating-point numbers that leads to computation speedup in two ways: 1) Reducing the per-operation latency when multiple floating-point additions are performed before result conversion to nonredundant format and 2) Removing the addition associated with rounding. While the first of these advantages is offered by other redundant representations, the second one is unique to our approach, which replaces the power-and area-intensive rounding addition by low-latency insertion of a rounding two-valued digit, or twit, in a position normally assigned to a redundant twit within the redundant-digit format. Instead of conventional sign-magnitude representation, we use a sign-embedded encoding that leads to lower hardware redundancy, and thus, reduced power dissipation. While our intermediate redundant representations remain incompatible with the IEEE 754-2008 standard, many application-specific systems, such as those in DSP and graphics domains, can benefit from our designs. Description of our radix-16 redundant representation and its addition algorithm is followed by the architecture of a floating-point adder based on this representation. Detailed circuit designs are provided for many of the adder's critical subfunctions. Simulation and synthesis based on a 0:13 m CMOS standard process show a latency reduction of 15 percent or better, and both area and power savings of around 58 percent, compared with the best designs reported in the literature.