Feature extraction is an important function in the speech recognition system. Employing a speech recognition system in low-resource devices (LRDs) have increased significantly in recent years. Implementing feature extraction, which involves complex computations, in LRD is very challenging because LRD has limited energy, storage, and processing power. The optimum design must carefully balance performance metrics, including speed, area, and energy. The objective of this research is to implement and model speech feature extraction design in a field-programmable gate array (FPGA) platform, and to identify the optimum implementation for low-resource devices (LRDs). The novelty of this research is optimising feature extraction implementations using design options such as word size; and developing accurate performance models to enhance future designs. The authors study extensively examines the effect of fixed-point n-bit word size on the design of Mel frequency cepstral coefficients feature extraction in the FPGA implementation. The results show that the performance metrics (area, power, and energy) increase at a slower pace compared with n because the dependency of some blocks (e.g. logarithm) on n is non-linear. For example, increasing n by 50% increases the resource utilisation by 38%, power by 41% and energy by 41%. Models for resources, power, and energy are developed with accuracies of 5.1, 4.5, and 4.3%, respectively. Furthermore, n has a weak impact on timing results and therefore speed is almost similar across implementations. Each bit (in n) costs 690 logical elements in the area, 2.35 mW in power and 0.55μJ in energy. For LRD, the 32-bit design demonstrates the most optimum design, followed by 48-bit and 24-bit designs.