<p>There is a growing demand for hardware systems that are computationally efficient and energy saving, to accelerate compute-intensive applications. These exciting challenges, such as memory access and GPU throughput, take performance, power, and resource trade-offs into account. This is especially relevant to a relatively new approach: Accelerating and running parallel computing systems using a machine learning (ML) technique. The objective of this thesis is to explore the hardware design requirements, and select appropriate choices for multi-core architectures, based on different ML, to address these challenges. This thesis contributes by developing effective hardware architectures which trade-off design objectives, by focusing on the following four distinct fields: • Utilization of low-power cache designs. The objective of this design is to use the artificial neural network (ANN) predictive model to optimize a hybrid spin-torque transfer random-access memory (STT-RAM), and static random-access memory (SRAM), for multicore chips. The simulation results demonstrate that the approach can result in significant power-aware improvement for different workloads. • Development of algorithms that facilitate power savings for three-dimensional integrated networks-on-chips (3D-NoCs) design. A novel approach was efficiently developed to predict the suitable routing algorithm based on the ANN model. A trade-off power-performance function was extracted as a target for the prediction mechanism. The obtained results show high throughput with low power consumption, and reduced thermal hotspots. • Enhance power-aware resources for general purpose graphics processing unit (GPGPU). The customized model is obtained by utilizing the available chip units for each application. The energy-delay product was adopted as the target factor for the proposed ANN model. The results show that when the GPU platform is tuned to the required resources, prediction accuracy, and power consumption are highly improved. This methodology provides more flexibility for the running application to face power challenges when using maximum hardware on the GPU systems. • To improve the performance of the 3D-NoC accelerator platform, we extended the integrated 3D-NoC based ANN simulator to support interconnection routing through adding torus topology. The simulation results for the case study concluded that the modified platform contributes towards low latency and reduced power consumption, especially with respect to high NoC dimensions.</p>