FASTUS is a system for extracting information from free text in English, and potentially other languages as well, for entry into a database, and potentially for other applications. It works essentially as a cascaded, nondeterministic finite state automaton. There are four steps in the operation of FASTUS. In Step 1 sentences are scanned for certain trigger words to determine whether further processing should be done. In Step 2 noun groups, verb groups, and prepositions and some other particles are recognized. The input to Step 3 is the sequence of phrases recognized in Step 2; patterns of interest are identified in Step 3 and corresponding "incident structures" are built up. In Step 4 incident structures that derive from the same incident are identified and merged, and these are used in generating database entries. FASTUS is an order of magnitude faster than any comparable system; it can process a news report in an average of less than eleven seconds. This translates directly into fast development time. In the three and a half weeks between its first use and the MUC-4 evaluation in May 1992, we were able to build up its domain knowledge to a point where it was among the leaders in the evaluation.
We have analyzed 607 sentences of spontaneous human-computer speech data containing repairs, drawn from a total corpus of 10,718 sentences. We present here criteria and techniques for automatically detecting the presence of a repair, its location, and making the appropriate correction. The criteria involve integration of knowledge from several sources: pattern matching, syntactic and semantic analysis, and acoustics.
INTRODUCTIO NSRI International participated in the MUC-6 evaluation using the latest version of SRI's FASTUS system [1] . The FASTUS system was originally developed for participation in the MUC-4 evaluatio n [3] in 1992, and the performance of FASTUS in MUC-4 helped demonstrate the viability of finit e state technologies in constrained natural-language understanding tasks . The system has undergon e significant revision since MUC-4, and it is safe to say that the current system does not share a singl e line of code with the original . The fundamental ideas behind FASTUS, however, are retained i n the current system : an architecture consisting of cascaded finite state transducers, each providin g an additional level of analysis of the input, together with merging of the final results . This paper will describe the version of the FASTUS system employed in MUC-6 and highlight the innovations that distinguish it from previous versions described in the literature . SRI used the FASTUS system for each of the MUC-6 tasks : the named entity task, the templateentity task, the coreference task, and the scenario template task . Because a single system, with a single configuration, was used to run all the tasks, and because the first three tasks are in som e sense prerequisites to the fourth, we will focus our attention in this paper on the scenario templat e task . BASIC FASTUSThe SRI FASTUS system is based on a series of finite-state transducers that compute the transformation of text from sequences of characters to domain templates . This architecture has proven t o be very flexible, and has been applied with success to a number of different information extractio n tasks in widely varying domains . We have applied FASTUS to extraction of information about terrorist incidents [3], extraction of information about joint ventures [2], indexing of legal document s for hypertext, extracting extensive information from military texts (Warbreaker Message Handler) , extraction of information from spoken dialogues [4], and a number of other smaller systems an d pilot applications . We have applied FASTUS to Japanese texts [2, 4] as well as English .Each transducer (or "phase") in the series takes the output of the previous phase and map s it into structures that comprise the input to the next phase, or that contain the domain templat e
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.