<abstract>
<p>Scientific documents contain a large number of mathematical expressions and texts containing mathematical semantics. Simply using mathematical expressions or text to retrieve scientific documents can hardly meet retrieval needs. The real difficulty in retrieving scientific documents is to effectively integrate mathematical expressions and related textual features. Therefore, this study proposes a multi-attribute scientific documents retrieval and ranking model based on GBDT (gradient boosting decision tree) and LR (logistic regression) by integrating the expressions and text contained in scientific documents. First, the similarities of the five attributes are calculated, including mathematical expression symbols, mathematical expression sub-forms, mathematical expression context, scientific document keywords and the frequency of mathematical expressions. Next, the GBDT model is used to discretize and reorganize the five attributes. Finally, the reorganized features are input into the LR model, and the final retrieval and ranking results of scientific documents are obtained. The experiment in this study was carried out on the NTCIR dataset. The average value of the final MAP@20 of the scientific document recall was 81.92%. The average value of the scientific document ranking nDCG@20 was 86.05%.</p>
</abstract>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.