The emerging Dark Silicon limitation has led the application designers to carefully consider the available Thermal Design Power (TDP) budgets, hardware resources, and software characteristics. In this paper, we propose a hierarchical scheme for distributing the resources and TDP budget among concurrently executing applications with multi-threaded workloads under throughput constraints. Afterwards, the application-level TDP budget is partitioned among its threads depending upon their workloads, which can then be fine-tuned at run time considering workload variations. We evaluate our scheme for the next-generation, multi-threaded, High Efficiency Video Codec and demonstrate that up to 30.86% higher throughput is achieved compared to the state-of-the-art.