Texts in videos contain important and useful information. As capturing the text information, one will be able to process the videos for the indexing, summarization, and content searching, and so on. In this paper, we have proposed a robust system for text detection, localization, segmentation, and removal in video. The detection method using gradient discrete cosines transform (DCT) to find text blocks in terms of the block DCT texture intensity information. The localization method using vertical and horizontal difference value to find boundary of the text strings. Based on the algorithm, real-time architecture is designed with pipeline schedule. With some special techniques, the memory cell can be greatly reduced. The fast prototyping of text detection had been implemented by PAC/Due embedded system. .