Abstract-In most embedded and general purpose architectures, stack data and non-stack data is cached together, meaning that writing to or loading from the stack may expel non-stack data from the data cache. Manipulation of the stack has a different memory access pattern than that of non-stack data, showing higher temporal and spatial locality. We propose caching stack and non-stack data separately and develop four different stack caches that allow this separation without requiring compiler support. These are the simple, window, and prefilling with and without tag stack caches. The performance of the stack cache architectures was evaluated using the SimpleScalar toolset where the window and prefilling stack cache without tag resulted in an execution speedup of up to 3.5% for the MiBench benchmarks, executed on an out-oforder processor with the ARM instruction set.