Nowadays embedded systems are growing at an impressive rate and provide more and more sophisticated applications characterized by having a complex array index manipulation and a large number of data accesses. Those applications require high performance specific computation that general purpose processors can not deliver at a reasonable energy consumption. Very long instruction word architectures seem a good solution providing enough computational performance at low power with the required programmability to speed up the time to market. Those architectures rely on compiler effort to exploit the available instruction and data parallelism to keep the data path busy all the time. With the density of transistors doubling each 18 months, more and more sophisticated architectures with a high number of computational resources running in parallel are emerging. With this increasing parallel computation, the access to data is becoming the main bottleneck that limits the available parallelism. To alleviate this problem, in current embedded architectures, a special unit works in parallel with the main computing elements to ensure efficient feed and storage of the data: the address generator unit, which comes in many flavors. Future architectures will have to deal with enormous memory bandwidth in distributed memories and the development of address generators units will be crucial for effective next generation of embedded processors where global trade-offs between reactiontime, bandwidth, energy and area must be achieved. This paper provides a survey of methods and techniques that optimize the address generation process for embedded systems, explaining current research trends and needs for future.