For the sake of the low recognition rate for degraded Chinese document, the performance of retrieval is not good if directly based on OCR result. This paper presents a new way to improve the performance of retrieval by fuzzy coding strategy. Lots of character classes with similar shapes are clustered and are indexed by pseudo code. For ease of test, this paper also presents a way to generate ground-truth of imaged document and synthesized degraded document image. A true OCR text collection and two synthesized document image collections are used for performance evaluation, and the result confirms the validation of our method.
Keywords-Retrieval of degraded Chinese document; fuzzy coding strategy; Synthesis of degraded documentI.