2020
DOI: 10.1007/978-3-030-45439-5_47
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating Substructure Similarity Search for Formula Retrieval

Abstract: Formula retrieval systems using substructure matching are effective, but suffer from slow retrieval times caused by the complexity of structure matching. We present a specialized inverted index and rank-safe dynamic pruning algorithm for faster substructure retrieval. Formulas are indexed from their Operator Tree (OPT) representations. Our model is evaluated using the NTCIR-12 Wikipedia Formula Browsing Task and a new formula corpus produced from Math StackExchange posts. Our approach preserves the effectivene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(12 citation statements)
references
References 21 publications
0
12
0
Order By: Relevance
“…The type must be specified by the user as there may be ambiguities, for example, matrix multiplication can also be the acronym for Artificial Intelligence. The searcher runs different query types on different indexes and uses a dynamic pruning algorithm [11] to generate structure-aware results efficiently.…”
Section: Searchermentioning
confidence: 99%
See 2 more Smart Citations
“…The type must be specified by the user as there may be ambiguities, for example, matrix multiplication can also be the acronym for Artificial Intelligence. The searcher runs different query types on different indexes and uses a dynamic pruning algorithm [11] to generate structure-aware results efficiently.…”
Section: Searchermentioning
confidence: 99%
“…Recent tasks have shown that the top effective formula retrieval systems all take advantage of indexing tokens from structured tree representations [7,9]. Currently, Approach Zero indexes prefix leaf-root paths from formula OPT representations, where each unique path corresponds to an inverted list, similar to regular search engines [11]. More specifically, a L A T E X markup is converted to OPT and then the paths from the leaf to the internal nodes are extracted, for example, + = 1 will break down into five prefix paths: x/+/=, x/+, y/+/=, y/+ and 1/= (single token paths will not be generated, and we use "/" to visually separate individual path tokens).…”
Section: Indexermentioning
confidence: 99%
See 1 more Smart Citation
“…However, the traditional full-text retrieval model for one-dimensional is not effective when facing the special two-dimensional pattern retrieval of mathematical expressions. At present, research studies on mathematical expression retrieval and ranking have been carried out with some progress, and methods and prototype systems [1][2][3][4][5][6] with mathematical retrieval functions have been proposed.…”
Section: Introductionmentioning
confidence: 99%
“…But it is very difficult to retrieve scientific documents with mathematical expressions because mathematical expressions are characterized by a complex two-dimensional structure. To date, research on mathematical expression retrieval has achieved abundant results [1][2][3][4][5][6][7].…”
Section: Introductionmentioning
confidence: 99%