For many years GPUs have been components of HPC clusters (Titan and Piz Daint), while only in recent years has the Intel R Xeon Phi TM been included (Tianhe-2 and Stampede). For example, GPUs are in 14% of systems in the November 2015 Top500 list, while the Xeon Phi TM is in 6%. Intel R came out with Xeon Phi TM to compete with NVIDIA GPUs by offering a unified environment that supports OpenMP and MPI, and by providing competitive and easier-to-utilize processing power with less energy consumption. Maximum Xeon Phi TM execution-time performance requires that programs have high data parallelism and good scalability, and use parallel algorithms. And, improved Phi TM power performance and throughput can be achieved by reducing the number of cores employed for application execution. Accordingly, the objectives of this paper are to: (1) Demonstrate that some applications can be executed with fewer cores than are available to users with a negligible impact on execution time: For 59.3% of the 27 application instances studied, doing this results in better performance and for 37% using less than half of the available cores results in performance degradation of not more than 10% in the worst case. (2) Develop a tool that provides the user with the optimal number of cores to employ: We designed an algorithm and developed a plugin for the Periscope Tuning Framework, an automatic performance tuner, that for a given application provide the user with an estimation of this number. (3) Understand if performance metrics can be used to identify applications that can be executed with fewer cores with a negligible impact on execution time: We identified, via statistical analysis, the following three metrics that are indicative of this, at least for the application instances studied: low L1 Compute to Data Access ratio, i.e., the average number of computations that are performed per byte of data loaded/stored in the L1 cache, high use of data bandwidth, and, to a lesser extent, low vectorization intensity.