This paper describes a secure and memory-efficient embedded fingerprint verification system. It shows how a fingerprint verification module originally developed to run on a workstation can be transformed and optimized in a systematic way to run real-time on an embedded device with limited memory and computation power. A complete fingerprint recognition module is a complex application that requires in the order of 1000 M unoptimized floating-point instruction cycles. The goal is to run both the minutiae extraction and the matching engines on a small embedded processor, in our case a 50 MHz LEON-2 softcore. It does require optimization and acceleration techniques at each design step. In order to speed up the fingerprint signal processing phase, we propose acceleration techniques at the algorithm level, at the software level to reduce the execution cycle number, and at the hardware level to distribute the system work load. Thirdly, a memory trace map-based memory reduction strategy is used for lowering the system memory requirement. Lastly, at the hardware level, it requires the development of specialized coprocessors. As results of these optimizations, we achieve a 65% reduction on the execution time and a 67% reduction on the memory storage requirement for the minutiae extraction process, compared against the reference implementation. The complete operation, that is, fingerprint capture, feature extraction, and matching, can be done in real-time of less than 4 seconds.