Since mathematical expressions on the web are not annotated with natural language, searching for expressions by conventional search engines is difficult. Our method performs web searches using a mathematical term as a query and extracts expressions related to it from the obtained PDF files. We convert the PDF to TeX, create images from the mathematical descriptions in TeX and obtain image feature quantities. The expressions are discriminated by a support vector machine (SVM) using the feature quantities. Our experimental results show that eliminating slide-derived PDF files effectively improves F-measure and the mean reciprocal rank (MRR) is best when using both PDFs and HTML.
Recently the importance of mathematical information retrieval (MIR) has been recognized and various methods for mathematical expression retrieval have also been proposed. However, since mathematical expressions on the Web are not annotated with natural language, searching for mathematical expressions by conventional search engines is difficult. For helping people in various fields who use mathematics as a learning tool, our proposed method performs a Web search using a mathematical term as a query, extracts mathematical expression images (math-images) related to the query from the obtained Web pages, and presents the top ten math-images with their surrounding information. The method measures the relevance between a query and a math-image from the following viewpoints: the math-image is in a separate line, it has the query in the neighborhood and appropriate image feature quantities, and it appears in the first part of the Web page. We use a support vector machine to discriminate if the image provides appropriate feature quantities. We conducted two experiments with our proposed method. We determined its scoring parameters in Experiment 1 and evaluated it in Experiment 2. The results revealed the usefulness of our proposed method with accuracy, recall, F-measure, mean reciprocal rank (MRR), and mean average precision (MAP).
Opportunities continue to grow for science students to search the web for mathematical terms used in engineering lectures. However, it is not easy to obtain sets of web documents that are unified and closely related to each other. In this paper, we propose a system that obtains suitable sets of web pages. First, we adopt a syllabus that represents a lecture and use it as input data. Then our system presents sets of mathematical web pages related to the input syllabus. The system also includes a mathematical dictionary, language processing algorithms, stop lists, and ranking algorithms. We describe how it searches for useful sets of web pages and presents them to users. We also show the results of two experiments. The first determined the most suitable values of the system's parameters. In the second, engineering graduate students evaluated the relevance among five syllabi and the presented web pages.
People cannot use a text search to find mathematical expressions because expressions cannot be replaced with words. Our research uses an ordinary text search and presents appropriate mathematical expression images (hereinafter called math-images) for input keywords. First we classify a set of the top ranking images from all the images in HTML files by scoring them. We focus on three viewpoints that are unique to mathematical expression images and mark the images by using these viewpoints. Then by adding bonus points to these marked images, the best three images are chosen from the set and presented with an explanation of the keyword and the surrounding information in the HTML files. We conducted two experiments to optimize the parameters of the expression giving the mark and to evaluate the effect of the bonus points. The rate of the average correct images of the best three was 79.5%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.