Reconfigurable intelligent surface (RIS) has been widely discussed as new technology to improve wireless communication performance. Based on the unique design of RIS, its elements can reflect, refract, absorb, or focus the incoming waves toward any desired direction. These functionalities turned out to be a major solution to overcome millimeter-wave (mmWave)s high propagation conditions including path attenuation and blockage. However, channel estimation in RIS-aided communication is still a major concern due to the passive nature of RIS elements, and estimation overhead that arises with multiple-input multiple-output (MIMO) system. As a consequence, user tracking has not been analyzed yet. This paper is the first work that addresses channel estimation, beamforming, and user tracking under practical mmWave RIS-MIMO systems. By providing the mathematical relation of RIS design with a MIMO system, a three-stage framework is presented. Starting with estimating the channel between a base station (BS) and RIS using hierarchical beam searching, followed by estimating the channel between RIS and user using an iterative resolution algorithm. Lastly, a popular tracking algorithm is employed to track channel parameters between the RIS and the user. System analysis demonstrates the robustness and the effectiveness of the proposed framework in real-time scenarios.