Multipliers are considered critical functional units in many systems like DigitalSignal Processing (DSP), machine learning, and so on. The overall performance of such systems are dependent on the efficiency of multipliers. However, multipliers are slow and power inefficient components due to their complex circuits, so we aim to reduce their power consumption by relaxing their accuracy requirements and at the same time enhancing their speed. In this paper, we present a fast and a power-aware multiplier that targets error-resilient systems. This is achieved by using our proposed approximation algorithm, a hybrid Wallace tree technique for reducing power consumption, and a hybrid ripple-carry adder for reducing latency. The proposed approximation algorithm is implemented using both a modified bit-width aware and carry-in prediction technique, while the proposed hybrid Wallace tree is implemented using high order counters. These proposed algorithms are implemented using HDL language, synthesized, and simulated using Quartus and Modelsim tools. For a 16-bit multiplier, a mean accuracy of 98.35% to 99.95% was achieved with a 45.77% reduction in power, a 21.48% drop in latency, and a 34.95% reduction in area. In addition, our design performs even better for larger size multipliers (32bit multiplier) where a 61.24% reduction in power was achieved, with an 8.74%drop in latency and a 35.24% reduction in area with almost no loss in accuracy. /journal/cta to overcome common performance metrics such as area, speed, and/or power consumption. Booth algorithm, 1 Wallace tree, 2,3 Dadda tree, 4 and array multiplier 5 are examples of such designs. Wallace tree multiplier is one of the best parallel designs used to reduce the multiplier's latency by adding partial products in parallel.Applications including multimedia, 6 neural networks, 7 DSP filtering, 8 and machine learning are error tolerant and do not require a perfect accuracy in computation; hence, getting an approximate result is sufficient. In multimedia applications, as an example, getting precise results is not always required because human observation is limited. For such applications, implementations can be relaxed in order to reduce power consumption, accelerate computations, and minimize area, thus, achieving better performance. Approximate computing is an emerging computing paradigm to enhance the performance of error-tolerant applications. 9-11 According to Han and Orshansky, 10 applications suitable for approximate computing can be classified into four classes: applications with analog inputs, applications with analog outputs, applications with no unique answers, and lastly iterative and convergent applications.Adders and multipliers are the two main components used for approximations. Many research works have been conducted on approximate implementations that are based on adders and can be found in Gupta et al, Zhu et al, On the other hand, fewer works exist in the field of approximate multipliers. Some algorithms used for approximate multipliers are truncation, ...