With the recent emergence of multicore architectures, the age of multicore computing might have already dawned upon us. This shift might have triggered the evolution of von Neumann architecture towards a parallel processing paradigm. Cellular Automata-inherently decentralized spatially extended systems consisting of large numbers of simple and identical components with local connectivity, also proposed by von Neumann in 1950s, is the potential candidate among the parallel processing alternatives. The spatial parallelism available on field programmable gate arrays make them the ideal platform to investigate the cellular automata systems as potential parallel processing paradigm on multicore architectures. The authors have been experimenting with this idea for quite some time now and report their progress from a single to a dual FPGA chip based cellular automata accelerator implementation. For D2Q9 Lattice Boltzmann method implementation, we were able to achieve an overall speed-up of 2.3 by moving our Fortran implementation to our single FPGA-based implementations. Further, with our dual FPGA-based implementation, we achieved a speed-up close to 1.8 compared to our single FPGA-based implementation.