The time-frequency tiling, bit allocation and the quantizer of most perceptual coding algorithms is either fixed or controlled by a perceptual model. The large variety of existing audio signals, each exhibiting different coding requirements due to their different temporal and spectral fine-structure suggests to use a signal-adaptive algorithm. The framework which is described in this paper makes use of a signal-adaptive wavelet filterbank which allows to switch any node of the wavelet-packet tree individually. Therefore each subband can have an individual timesegmentation and the overall time-frequency tiling can be adapted to the signal using optimization techniques. A rate-distortion optimality can be defined which will minimize the distortion for a given rate in every subband, based on a perceptual model. Due to the additivity of the rate and distortion measure over disjoint covers of the input signal, an overall cost function including the switching cost for the filterbank switching can be defined. By the use of dynamic programming techniques, the wavelet-packet tree can be pruned based on a top-down or bottom-up "split-merge" decision in every node of the wavelet-tree. Additionally we can profit from temporal masking due to the fact that each subband can have an individual segmentation in time without introducing time domain artifacts such as pre-echo distortion.