Large-scale integration of multiple cores on a single chip is the current answer to the challenge of attaining higher computation throughput while restricting power consumption within acceptable limits. Network-on-Chip (NoC) is an emerging paradigm that can efficiently support integration of a massive number of cores on a chip by decoupling the on-chip computation and communication infrastructure, thereby overcoming scalability issues faced by conventional buses.Many scientific computing disciplines, such as computational biology, have seen a significant increase in the availability of parallel algorithms and highperformance computing (HPC) tools owing to high runtime complexities and/or the data-intensive nature underlying the computation. Software-only solutions are likely to be inadequate, creating the need for hardware accelerators. This dissertation explores the design and development of highly optimized NoC-based hardware accelerators for a particular class of biocomputing applications, viz. phylogeny reconstruction, which is important for evolutionary inferences in computational biology. This dissertation focuses on two computationally distinct phylogeny reconstruction approaches to demonstrate that NoC-based many-core platforms can deliver orders of magnitude reduction in time-to-solution, compared to v existing approaches. The Maximum Parsimony (MP) phylogeny reconstruction problem can be reduced to one of solving numerous instances of the classical Traveling Salesman Problem (TSP). 99% of the total software runtime is spent in computing TSP instances, whose solution typically involves an application of branch-and-bound runtime heuristics. This dissertation presents the design of many-core systems with core-level pipelined micro-parallel architecture and different interconnection topologies to achieve significant speedup and energy efficiency. In Maximum Likelihood (ML) phylogeny reconstruction, the improved quality of result comes at a higher computational cost, as this approach involves optimization over multi-dimensional real continuous space. We present NoCbased hardware accelerators that target function kernels contributing to a bulk of the runtime. These platforms combine novel ideas and approaches, such as space-filling Hilbert curves, parallelized core allocation schemes, and 3-D integration. We also explore the use of long-range on-chip wireless links on existing regular topologies to reduce network diameter, thereby reducing the average communication latency between cores. These platforms have the potential to serve a broader class of throughput-oriented HPC applications. viTable of Contents