The first stage of processing in the IBM Watsoni system is to perform a detailed analysis of the question in order to determine what it is asking for and how best to approach answering it. Question analysis uses Watson's parsing and semantic analysis capabilities: a deep Slot Grammar parser, a named entity recognizer, a co-reference resolution component, and a relation extraction component. We apply numerous detection rules and classifiers using features from this analysis to detect critical elements of the question, including: 1) the part of the question that is a reference to the answer (the focus); 2) terms in the question that indicate what type of entity is being asked for (lexical answer types); 3) a classification of the question into one or more of several broad types; and 4) elements of the question that play particular roles that may require special handling, for example, nested subquestions that must be separately answered. We describe how these elements are detected and evaluate the impact of accurate detection on our end-to-end question-answering system accuracy.
Two deep parsing components, an English Slot Grammar (ESG) parser and a predicate-argument structure (PAS) builder, provide core linguistic analyses of both the questions and the text content used by IBM Watsoni to find and hypothesize answers. Specifically, these components are fundamental in question analysis, candidate generation, and analysis of passage evidence. As part of the Watson project, ESG was enhanced, and its performance on Jeopardy!i questions and on established reference data was improved. PAS was built on top of ESG to support higher-level analytics. In this paper, we describe these components and illustrate how they are used in a pattern-based relation extraction component of Watson. We also provide quantitative results of evaluating the component-level performance of ESG parsing.
A key phase in the DeepQA architecture is Hypothesis Generation, in which candidate system responses are generated for downstream scoring and ranking. In the IBM Watsoni system, these hypotheses are potential answers to Jeopardy!i questions and are generated by two components: search and candidate generation. The search component retrieves content relevant to a given question from Watson's knowledge resources. The candidate generation component identifies potential answers to the question from the retrieved content. In this paper, we present strategies developed to use characteristics of Watson's different knowledge sources and to formulate effective search queries against those sources. We further discuss a suite of candidate generation strategies that use various kinds of metadata, such as document titles or anchor texts in hyperlinked documents. We demonstrate that a combination of these strategies brings the correct answer into the candidate answer pool for 87.17% of all the questions in a blind test set, facilitating high end-to-end question-answering performance.
One useful source of evidence for evaluating a candidate answer to a question is a passage that contains the candidate answer and is relevant to the question. In the DeepQA pipeline, we retrieve passages using a novel technique that we call Supporting Evidence Retrieval, in which we perform separate search queries for each candidate answer, in parallel, and include the candidate answer as part of the query. We then score these passages using an assortment of algorithms that use different aspects and relationships of the terms in the question and passage. We provide evidence that our mechanisms for obtaining and scoring passages have a substantial impact on the ability of our question-answering system to answer questions and judge the confidence of the answers. HypothesesWe evaluated the following two hypotheses.Hypothesis 1VAnalyzing passages using a variety of strategies, at different depths of analysis, is significantly more effective than a single scoring strategy. Specifically,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.