Dwindling inter-generational CPU performance and power consumption improvements previously made possible by semiconductor scaling motivate hardware specialization for a wide variety of tasks. In recent years, implementing certain algorithms as specialized circuits ("accelerators") has been proven to improve both speediness and power/energy efficiency compared to the equivalent CPU implementation. One particular domain that shows great promise for hardware specialization is automata processing. Finite automata are most commonly known as the back-end data structures for regular expressions, which are used in a wide variety of applications such as antivirus file scanning and packet payload inspection for network intrusion detection systems (NIDS). Several research efforts have extended the applicability of finite automata beyond just regular expression into domains such as machine learning, particle physics, bioinformatics, and pattern mining. The versatility of automata processing as well as its inefficiency on traditional von Neumann computer architectures informs the need for a flexible and high-performance accelerator for these applications.In this thesis, an FPGA-based automata processing hardware accelerator is implemented in two different configurations: (1) a traditional discrete FPGA accelerator board attached over PCI-Express; and (2) a new tightly-coupled cache-coherent FPGA accelerator architecture utilizing the Intel Broadwell Xeon CPU + Arria 10 FPGA platform, known as the Hardware Accelerator Research Platform ("HARP").i