We describe the design and implementation of an autonomous prototype vehicle which finds an empty parking slot in a parking area, and parks itself in the empty parking slot, using neural networks based on deep reinforcement learning (RL). To perform an autonomous parking procedure for our prototype vehicle, two different artificial neural networks (ANNs) are trained using a deep RL Algorithm in a simulation environment and embedded into the computing platform of the prototype car.One of the ANNs enables the vehicle to drive autonomously in the parking environment. At the same time, an image processing algorithm is used to determine whether a parking slot is empty. When the image processing algorithm finds a suitable parking slot, a different ANN is activated and performs a safe parking procedure. However, ANN-based machine learning techniques require high processing power and impose a high computational burden on embedded CPU and GPU platforms. To alleviate the computational burden, one can achieve higher performance and less power consumption using an application-specific hardware design, where logic resources are fully exploited according to the algorithm of interest, in an energy-efficient manner. In this article, hardware accelerators for our ANN models are designed and generated via the Vivado high-level synthesis (HLS) tool, targeting an ARM based programmable SoC platform, ZedBoard. Our ANN accelerators have achieved a speedup of 17x as compared to an ARM software implementation. For deeper fully-connected layers used in deep RL-based solutions, function-level parallelism (Vivado's dataflow) is employed to improve the computational efficiency. Our proposed stage-level description for fully connected layers outperforms recent studies in terms of computation time.