Abstract-Field-programmable gate arrays (FPGAs) are increasingly used to implement embedded digital systems, however, the hardware design necessary to do so is time-consuming and tedious. The amount of hardware design can be reduced by employing a microprocessor for less-critical computation in the system. Often this microprocessor is implemented using the FPGA reprogrammable fabric as a soft processor which presently have simple architectures and moderate performance. Our goal is to scale the performance of existing soft processors hence expanding their suitability to more critical computation. To this end we propose extending soft processors with vector extensions to exploit the abundant data parallelism found in many embedded kernels. Such a soft vector processor can execute these kernels much faster than a single-core hence reducing the need for hardware implementations. We observe this improved execution speed through experimentation with vector extended soft processor architecture (VESPA) which is designed, implemented, and evaluated on real FPGA hardware. VESPA is shown to effectively scale performance up to 32 lanes, while providing substantial architectural flexibility to create a fine-grained design space. With these characteristics, and portability across FPGA devices, soft vector processors can provide exact-fit architectures which can efficiently and more easily implement data parallel workloads over custom FPGA hardware design.Index Terms-Customization, design space exploration, field-programmable gate array (FPGA)-based soft-core processors, processor generator.