Texts of a particular type evidence a discernible, predictable schema. These schemata can be delineated, and as such provide models of their respective text-types which are of use in automatically structuring texts. We have developed a Text Structurer module which recognizes text-level structure for use within a larger information retrieval system to delineate the discourse-level organization of each document's contents. This allows those document components which are more likely to contain the type of information suggested by the user's query to be selected for higher weighting. We chose newspaper text as the first text type to implement. Several iterations of manually coding a randomly chosen sample of newspaper articles enabled us to develop a newspaper text model. This process suggested that our intellectual decomposing of texts relied on six types of linguistic information, which were incorporated into the Text Structurer module. Evaluation of the results of the module led to a revision of the underlying text model and of the Text Structurer itself.
The Pedigree Management and Assessment Framework (PMAF) is a customizable framework for writing, retrieving and assessing provenance and other metadata that reflects the quality of an information object (such as a document), the relationships between information objects and resources (such as people and organizations), etc. PMAF stores metadata in a volume-efficient format using RDF (Resource Description Framework), and can write and query metadata at a fine-grained level. Once metadata has been stored in PMAF, the user can run a variety of assessments (predefined queries) to reveal particular aspects of the metadata graph. We will demonstrate the PMAF browser interface, which can be used to view the existing metadata graph for an information object; the PMAF assessment interface, which allows the user to select and run predefined queries on the metadata; and the integration of PMAF with a standard document editor and content management system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.