With the great impact of vision and Artificial Intelligence (AI) technology in the fields of quality control, robotic assembly and robot navigation, the hardware implementation of object detection and classification algorithms on embedded platforms has got ever-increasing attention these days. The real-time performance with optimum resource utilization of the implementation and its reliability as well as the robustness of the underlying algorithm is the overarching challenges in this field. In this work, an approach employing a fast and accurate vision-based shape-detection algorithm has been proposed and its implementation in heterogeneous System on Chip (SoC) is discussed. The proposed system determines centroid distance and its Fourier Transform for the object feature vector extraction and is realized in the Zybo Z7 development board. The ARM processor is responsible for communication with the external systems as well as for writing data to the Block RAM (BRAM), the control signals for efficient execution of the memory operations are designed and implemented using Finite State Machine (FSM) in the Programmable Logic (PL) fabric. Shape feature vector determination has been accelerated using custom modules developed in Verilog, taking full advantage of the possible parallelization and pipeline stages. Meanwhile, industry-standard Advanced Extendable Interface (AXI) buses are adopted for encapsulating standardized IP cores and building high-speed data exchange bridges between units within Zynq-7000. The developed system processes images of size 32 × 64 in real-time and can generate feature descriptors at a clock rate of 62MHz. Moreover, the method yields a shape feature vector that is computationally light, scalable and rotation invariant. The hardware design is validated using MATLAB for comparative studies