This work realises a memory-efficient embedded automatic speech recognition (ASR) system on a resourceconstrained platform. A buffering method called ultra-low queue-accumulator buffering is presented to efficiently use the constrained memory to extract the linear prediction cepstral coefficient (LPCC) feature in the embedded ASR system. The optimal order of the LPCC is evaluated to balance the recognition accuracy and the computational cost. In the decoding part, the proposed enhanced cross-words reference templates (CWRTs) method is incorporated into the template matching method to reach the speaker-independent characteristic of ASR tasks without the large memory burden of the conventional CWRTs method. The proposed techniques are implemented on a 16-bit microprocessor GPCE063A platform with a 49.152 MHz clock, using a sampling rate of 8 kHz. Experimental results demonstrate that recognition accuracy reaches 95.22% in a 30-sentence speaker-independent embedded ASR task, using only 0.75 kB RAM.