In this paper, a novel method is proposed for Chinese large-scale online encyclopedia knowledge denoising. Firstly, the initial similarity of the triples is acquired by the similarity computing method integrating the Edit-Distance and TongYiCiCiLin similarity algorithm. Secondly, a novel nuclear field-like potential function of the Infobox knowledge triples is constructed in virtue of Chinese encyclopedia entry semantic tag. Finally, large-scale knowledge triple clustering and denoising are performed by means of the improved potential function proposed in this paper for the purpose of minimizing the influence of massive repetition and ambiguity in the Chinese open encyclopedia Knowledge Base (KB). The proposed method has solved the problems of semantic duplication, ambiguity and inappropriate classification of knowledge triples arising from constructing Chinese KBs. The experimental results indicate that the open-domain oriented Chinese encyclopedia KBs constructed by the method proposed in this paper is outperformed than the stateof-the-art methods. INDEX TERMS Knowledge base, online encyclopedia, knowledge denoising, similarity computing, nuclear field-like potential function.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.