Multiply-Accumulate (MAC) operation is the backbone of Least Mean Squares (LMS) digital adaptive filters. Implementing LMS on hardware platform as a Fully Dedicated Architecture (FDA) multiplier becomes bottleneck for higher order filters, prompting high area, cost and power requirements and hence renders the design unsuited for practical implementation. In this paper, we have proposed a composite design that makes use of Distributed Arithmetic (DA) to replace the bottleneck multiplier with memory units that store Partial Products (PPs) to emulate multiplication. The depth of these memory units tends to exponentially grow as the filter order rises. To manage that, we have used Half Memory algorithm (HM) and Offset Binary Coding (OBC) to refine the structure of PPs such that the memory size is reduced at least by a factor of 4 for the same filter order. The proposed design improves system's Throughput, Critical Path Delay, Power Consumption and FPGA Resource Utilization. However, it introduces Latency in both the output and update segments of the LMS algorithm. To provide an option between resource utilization and latency, we have suggested a mechanism to halve the originally produced latency by the Parallel Processing of input bit steam w.r.t even and odd bits. Moreover, we have also proposed a method that reduces the latency of update module at the slight expense of other design attributes. The fundamental structure of the proposed design is flexible owing to the dynamic memory structure as well as the option to choose between latency and resource minimization. Simulations have been carried out in Xilinx Vivado and conclusions have been drawn by comparing both FDA and DA based designs. Results for a 16-tap filter indicate a remarkable improvement in Throughput, Area Utilization and Power Consumption by 18%, 5% and 3.5% respectively at the expense of 4× escalated latency. The Half Latency method allowed the latency to drop 2× but with slightly elevated power and area attributes.