In the past few decades, technology evolution has brought many improvements to the field of human rehabilitation. Among these improvements, the delivery of rehabilitation services to users at distant locations-known as telerehabilitation-made it possible to provide patients with home-based therapy, reducing the need for displacement to rehabilitation centers. In addition, the use of virtual and augmented training environments have enabled realistic simulation in the rehabilitation process, enhancing patients' experience and reducing exposure to real-world risks. Despite these benefits, when used in a distributed context, Augmented Reality technology brings additional challenges to the development process, including the integration of software components provided by heterogeneous sources. Also, the development of robust tracking techniques, less prone to detection errors resulting from network transmission is desired. This particular requirement motivated the development of alternative tracking algorithms, including the use of deep neural networks in visual tracking. When designing these trackers, the major role of network's topology selection has lead to the investigation of automatic search techniques, including evolutionary computation algorithms. In this context, this work presents as contribution a software architecture model for Augmented Reality, focusing on the reliable integration of independent software components for augmented content streaming. In addition, deep visual trackers for fiducial marker detection were designed and evaluated, outperforming other trackers when processing remotely captured video frames. Finally, the application of an evolutionary algorithm in deep neural networks' topology selection is discussed, presenting competitive results when compared to state-of-the-art methods.