In recent years, a growing number of math contents are available on the Web. When conventional search engines deal with mathematical expressions, the two-dimensional structure of mathematical expressions is lost, which results in a low performance of math retrieval. While the retrieval technology specifically designed for mathematical expressions is not mature currently. Aiming at these problems, an improved mathematical expression indexing and matching method was proposed through employing full text index method to deal with the two-dimensional structure of mathematical expressions. Firstly, through the fully consideration of LaTeX formulae' characteristics, a feature representation method of mathematical expressions and a clustering method of feature keywords were put forward. Then, an improved inter-relevant successive trees index model was applied to the construction of the mathematical expression index, in which the cluster algorithm of mathematical expression features was employed to solve the problem of the quantity growth of the trees in processing large amount of formulae. Finally, the matching algorithms of mathematical expressions were given which provide four query modes called exact matching, compatible matching, sub-expression matching and fuzzy matching. In browser/server mode, 110027 formulae were used as experimental samples. The index file size was 29.02 Mb. The average time of retrieval was 1.092 seconds. The experimental result shows the effectiveness of the method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.