This paper describes a framework for multidocument summarization which combines three premises: coherent themes can be identified reliably; highly representative themes, running across subsets of the document collection, can function as multi-document summary surrogates; and effective end-use of such themes should be facilitated by a visualization environment which clarifies the relationship between themes and documents. We present algorithms that formalize our framework, describe an implementation, and demonstrate a prototype system and interface.
We present the architecture and data model for TEXTRACT, a document analysis framework for text analysis components. The framework and components have been deployed in research and industrial environments for text analysis and text mining tasks.
Users of computerized dictionaries require powerful and flexible tools for analyzing and manipulating the information in them. This paper discusses a system for grammatically describing and parsing entries from machine-readable dictionary tapes and a lexicai data base representation for storing the dictionary information. It also describes a language for querying, formatting, and maintaining dictionaries and other lexical data stored with that representation.
Computerist: ... But, great Scott, what about structure? You can't just bang that lot into a machine without structure. Half a gigabyte of sequential file ... Lexicographer: Oh, we know all about structure. Take this entry for example. You see here italics as the typical ambiguous structural element marker, being apparently used as an undefined phrase-entry lemrna, but in fact being the subordinate entry headword address preceding the small-cap cross-reference headword address which is nested within the gloss to a defined phrase entry, itself nested within a subordinate (bold lower-case letter) sense section in the second branch of a forked multiple part of speech main entry. Now that's typical of the kind of structural relationship that must be made crystal-clear in the eventual database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.