Most of the emerging content-based multimedia technologies are based on efficient methods to solve machine early vision tasks. Among other tasks, object segmentation is perhaps the most important problem in single image processing. The solution of this problem is the key technology of the development of the majority of leading-edge interactive video communication technology and telepresence systems. The aim of this paper is to present a robust framework for real-time object segmentation and tracking in video sequences taken simultaneously from different perspectives. The other contribution of the paper is to present a new dedicated parallel hardware architecture. It's composed of a mixture of Digital Signal Processing (DSP) and Field Programmable Gate Array (FPGA) technologies and uses the Content Addressable Memory (CAM) as a main processing unit. Experimental results indicate that small amount of hardware can deliver real-time performance and high accuracy. This is an improvement over previous systems, where execution time of the second-order using a greater amount of hardware has been proposed.