This paper presents the first automatic scheme to allocate local (stack) data in recursive functions to scratch-pad memory (SPM) in embedded systems. A scratch-pad is a fast directly addressed compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its significantly lower access time, energy consumption, real-time bounds, area and overall runtime. Existing compiler methods for allocating data to scratch-pad are able to place only code, global, heap and non-recursive stack data in scratch-pad memory; stack data for recursive functions is allocated entirely in DRAM, resulting in poor performance.In this paper we present a dynamic yet compiler-directed allocation method for recursive function stack data that for the first time, is able to place a portion of recursive stack data in scratch-pad. It has almost no software-caching overhead, and is able to move recursive function data back and forth between scratchpad and DRAM to better track the program's locality characteristics. With our method, all code, global, stack and heap variables can share the same scratch-pad. When compared to placing all recursive function data in DRAM and all other variables in scratch-pad, our results show that our method reduces the average runtime of our benchmarks by 29.3%, and the average power consumption by 31.1%, for the same size of scratch-pad fixed at 5% of total data size. Furthermore, significant savings were observed when comparing our method against cache-based alternatives for SPM allocation. Finally, we show results that analyze the effects of profile variation on our allocation approach and present a modified version of our method which minimizes variation for profile-based allocations.