Abstract-The automatic extraction of metadata and other information from scholarly documents is a common task in academic digital libraries, search engines, and document management systems to allow for the management and categorization of documents and for search to take place. A Web-accessible API can simplify this extraction by providing a single point of operation for extraction that can be incorporated into multiple document workflows without the need for each workflow to implement and support its own extraction functionality. In this paper, we describe CiteSeerExtractor, a RESTful API for scholarly information extraction that exploits the fact that there is duplication in scholarly big data and makes use of a near duplicate matching backend. The backend stores previously extracted metadata and avoids extracting metadata from a document if it has already been extracted before. We describe the design, implementation, and functionality of CiteSeerExtractor and show how the duplicate document matching results in a difference of 8.46% in the time required to extract header and citation information from approximately 3.5 million documents compared to a baseline.
Baihua Village is a typical mountainous village in the southwest part of Lujiang county, Longyang District, Baoshan City, Yunnan Province. Residents there made a living on the land, including growing sugarcane and planting maize, whose annual income was no more than 2000 yuan before 2006. Since then when a research institute has set it as one of the pilot villages for mango growing impetus with sci-tech. For the sake of “One village and One Product”, mango breeding and relevant techniques have been applied to daily work. Within years, the developed model of has been explored: simply “villages are the main carriers facilitated by the specialized cooperative for mango growing, back-up by science and technology. Technical trainings serve as the driving force for the leading growers, meanwhile, sellers work as the bridge link the producing-end and the markets”.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.