1995
DOI: 10.1093/comjnl/38.1.43
|View full text |Cite
|
Sign up to set email alerts
|

An Algebra for Structured Text Search and a Framework for its Implementation

Abstract: A query algebra is presented that expresses searches on structured text. In addition to traditional fulltext boolean queries that search a pre-defined collection of documents, the algebra permits queries that harness document structure. The algebra manipulates arbitrary intervals of text, which are recognized in the text from implicit or explicit markup. The algebra has seven operators, which combined intervals to yield new ones: containing, not containing, contained in, not contained in, one of, both of, foll… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
62
0
6

Year Published

1998
1998
2009
2009

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 106 publications
(68 citation statements)
references
References 12 publications
0
62
0
6
Order By: Relevance
“…Regions models (Burkowski 1992;Clarke et al 1995;Navarro and Baeza-Yates 1997;Jaakkola and Kilpelainen 1999) Figure 3 shows a fragment from Shakespeare's Hamlet for which we numbered the word positions. The figure shows the region that starts at word 103 and ends at word 131.…”
Section: Region Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Regions models (Burkowski 1992;Clarke et al 1995;Navarro and Baeza-Yates 1997;Jaakkola and Kilpelainen 1999) Figure 3 shows a fragment from Shakespeare's Hamlet for which we numbered the word positions. The figure shows the region that starts at word 103 and ends at word 131.…”
Section: Region Modelsmentioning
confidence: 99%
“…As above, the query <SPEECH> CONTAINING Hamlet retrieves all speeches that contain the word 'Hamlet'. In later publications Clarke et al (1995) and Jaakkola and Kilpelainen (1999) describe region models that do not distinguish mark-up from content. In their system, the operator FOLLOWED BY is needed to match opening and closing tags, so the query would be somewhat more verbose: (<speech> FOLLOWED BY </speech>) CONTAINING Hamlet In some region models, such as the model by Clarke et al (1995) the query A AND B does not retrieve the intersection of sets A and B, but instead retrieves the smallest regions that contain a region from both set A and set B.…”
Section: Region Modelsmentioning
confidence: 99%
“…A region expressed by the tags can be any meaningful units such as title, section, or sentence. Region Algebra by [8] defined a set of algebraic operators on the sets of regions. Operators in the original algebra are shown in Table 1 Let us denote the set of regions of the start tag <A>, and one of the end tag </A>, by S(<A>), and S(</A>), respectively.…”
Section: Region Algebramentioning
confidence: 99%
“…The major attraction of the framework of region algebra by [8] is in its efficient algorithms for finding regions one by one that satisfy a query formulated in an algebraic formula. The algorithms for Containing( ) and Followedby ( ) are shown in Fig.…”
Section: Region Algebramentioning
confidence: 99%
See 1 more Smart Citation