Visual feature extraction is widely used in many of computer vision algorithms such as object detection, stereo matching, image matching, and Visual SLAM. Real-time implementation of visual feature extraction suffers from long latency, and heavy computation. GPUs are commonly used to accelerate computationally intensive applications, but they are power hungry devices and, they have a fixed level of parallelism. To achieve processing optimization for such computationally intensive applications, FPGA, and ASIC are more efficient. In this work, an efficient and optimized FPGA architecture is designed to accelerate the computation of visual feature extraction. FAST and BRIEF algorithms are used in this system because they are simple and efficient when working with mobile and embedded systems. The system uses different frequency clocks for input pixels streaming and for processing to prevent stall cycles and achieve high speed. Multi pixels are processed per single clock cycle using optimized parallel and pipelined architecture. The proposed architecture is implemented on TERASIC DE2-115 FPGA board and tested with different image sizes and achieved very high throughput. The system can detect features up to 1000 frames per second for grayscale images with the size of 480 ×480 pixels.