Retrieval activities in a database consisting of heterogeneous collections of structured text

Burkowski, Forbes J.

doi:10.1145/133160.133185

Cited by 51 publications

(37 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…At the end of the 1980's, researchers at the University of Waterloo in Canada researched database support for the creation of an electronic version of the Oxford English Dictionary. This resulted in a number of models for querying and manipulating content and hierarchical structure such as the parsed strings model [10], PAT expressions [15], the containment model [5] and generalized concordance lists model [7]. Similar approaches were developed elsewhere, such as the proximal nodes model [13] and the nested region model [11].…”

Section: Historical Backgroundmentioning

confidence: 99%

“…An interesting approach is suggested by Alink [1], who introduces additional XPath steps (select-narrow and select-wide) that navigate from one hierarchy to another. For instance, in the following XQuery Full-Text-like query fragment navigates from the paragraph elements to another hierarchy with a Verb element that contains "killed", and to a hierarchy with a person element that contains "Abraham Lincoln": The need for multiple hierarchies is for instance addressed in the containment model [5], and the proximal nodes model [13]. In several publications, the hierarchies are called "stand-off annotation" or "offset annotation" to stress that the structural information (or annotations) are modeled separately from the textual data.…”

mentioning

confidence: 99%

“…Exact matching vs. ranking Many of the early structured text retrieval models do not consider ranked retrieval results, or if they do only as an afterthought, i.e., by ranking the retrieval results using a text-only query disregarding the structural conditions in the query [5]. A simple but powerful way to take the structure of the results into account is to apply a standard information retrieval model to the retrieved content, and then propagate element scores or aggregate term weights based on the text structure.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Structured Text Retrieval Models

Hiemstra¹,

Baeza–Yates²

2017

Encyclopedia of Database Systems

View full text Add to dashboard Cite

DEFINITIONStructured text retrieval models provide a formal definition or mathematical framework for querying semistructured textual databases. A textual database contains both content and structure. The content is the text itself, and the structure divides the database into separate textual parts and relates those textual parts by some criterion. Often, textual databases can be represented as marked up text, for instance as XML, where the XML elements define the structure on the text content. Retrieval models for textual databases should comprise three parts: 1) a model of the text, 2) a model of the structure, and 3) a query language [4]: The model of the text defines a tokenization into words or other semantic units, as well as stop words, stemming, synonyms, etc. The model of the structure defines parts of the text, typically a contiguous portion of the text called element, region, or segment, which is defined on top of the text model's word tokens. The query language typically defines a number of operators on content and structure such as set operators and operators like "containing" and "contained-by" to model relations between content and structure, as well as relations between the structural elements themselves. Using such a query language, the (expert) user can for instance formulate requests like "I want a paragraph discussing formal models near to a table discussing the differences between databases and information retrieval". Here, "formal models" and "differences between databases and information retrieval" should match the content that needs to be retrieved from the database, whereas "paragraph" and "table" refer to structural constraints on the units to retrieve. The features, structuring power, and the expressiveness of the query languages of several models for structured text retrieval are discussed below. HISTORICAL BACKGROUNDThe STAIRS system (Storage and Information Retrieval System), which was developed at IBM already in the late 1950's allowed querying both content and structure. Much like today's On-line Public Access Catalogues, it was used to store bibliographic data in records with fields such as keywords and title, providing structured search, but no overlapping or hierarchical structures nor full text search. At the end of the 1980's, researchers at the University of Waterloo in Canada researched database support for the creation of an electronic version of the Oxford English Dictionary. This resulted in a number of models for querying and manipulating content and hierarchical structure such as the parsed strings model [10], PAT expressions [15], the containment model [5] and generalized concordance lists model [7]. Similar approaches were developed elsewhere, such as the proximal nodes model [13] and the nested region model [11]. The interest in structured text retrieval models has grown since the introduction of XML in 1998, and the emergence of standard data retrieval query languages (see XPATH/XQUERY) for XML data. One might argue that the structured text retrieval approaches such...

show abstract

Section: Historical Backgroundmentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Structured Text Retrieval Models

Hiemstra¹,

Baeza–Yates²

2017

Encyclopedia of Database Systems

View full text Add to dashboard Cite

show abstract

“…Regions models (Burkowski 1992;Clarke et al 1995;Navarro and Baeza-Yates 1997;Jaakkola and Kilpelainen 1999) Figure 3 shows a fragment from Shakespeare's Hamlet for which we numbered the word positions. The figure shows the region that starts at word 103 and ends at word 131.…”

Section: Region Modelsmentioning

confidence: 99%

“…Systems that supports region queries can process complex queries, such as the following that retrieves all lines in which Hamlet says "farewell": (<LINE> CONTAINING farewell) CONTAINED BY (<SPEECH> There are several proposals of region models that differ slightly. For instance, the model proposed by Burkowski (1992) implicitly distinguishes mark-up from content. As above, the query <SPEECH> CONTAINING Hamlet retrieves all speeches that contain the word 'Hamlet'.…”

Section: Region Modelsmentioning

confidence: 99%

Information Retrieval Models

Hiemstra

2009

Information Retrieval

View full text Add to dashboard Cite

Properties‐based retrieval and user decision states: User control and behavior modeling

Benoît

2004

J. Am. Soc. Inf. Sci.

View full text Add to dashboard Cite

As retrieval set size in information retrieval (IR) becomes larger, users may need greater interactive opportunities to determine for themselves potential relevance of the resources offered by a given collection. A parts-of-document approach, coupled with an interactive graphic interface and control panel, permits end users to tailor the information seeking (IS) session. Applying the model described by the author in a previous paper in this journal, this paper explores two issues: whether a group of information seekers in the same research domain will want to use this type of IR interaction, and whether such interaction is more successful than relevancy ranked lists, based on the general vector model. In addition, the paper proposes the use of gradient space as a means of capturing end users' cognitive states-decision-making points-during a parts-of-document-based IR session. It concludes that, for a group of biomedical researchers, a parts-of-document approach is preferred for certain IR situations and that gradient space provides designers of systems with empirical evidence suited for systems analysis.

show abstract

Retrieval activities in a database consisting of heterogeneous collections of structured text

Cited by 51 publications

References 15 publications

Structured Text Retrieval Models

Structured Text Retrieval Models

Information Retrieval Models

Properties‐based retrieval and user decision states: User control and behavior modeling

Contact Info

Product

Resources

About