Solving complex optimization problems with genetic algorithms (GAs) with custom computing architectures is a way to improve the execution time of this metaheuristic, which is known to consume considerable amounts of time to converge to final solutions. In this work, we present a scalable computing array architecture to accelerate the execution of cellular GAs (cGAs), a variant of genetic algorithms which can conveniently exploit the coarse-grain parallelism afforded by custom parallel processing. The proposed architecture targets Xilinx FPGAs and is used as an auxiliary processor of an embedded CPU (MicroBlaze). To handle different optimization problems, a high-level synthesis (HLS) design flow is proposed where the problem-dependent operations are specified in C++ and synthesised to custom hardware, thus requiring a minimum knowledge of digital design for FPGAs. The minimum energy broadcast (MEB) problem in wireless ad hoc networks is used as a case study. An existing software implementation of a GA to solve this problem is ported to the proposed computing array to demonstrate its effectiveness and the HLS-based design flow. Implementation results in a Virtex-6 FPGA show significant speedups, while finding solutions with improved quality.