Jialu Fan scite author profile

IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV Quiz show, Jeopardy! The extent of the challenge includes fielding a real-time automatic contestant on the show, not merely a laboratory exercise. The Jeopardy! Challenge helped us address requirements that led to the design of the DeepQA architecture and the implementation of Watson. After 3 years of intense research and development by a core team of about 20 researches, Watson is performing at human expert-levels in terms of precision, confidence and speed at the Jeopardy! Quiz show. Our results strongly suggest that DeepQA is an effective and extensible architecture that may be used as a foundation for combining, deploying, evaluating and advancing a wide range of algorithmic techniques to rapidly advance the field of QA.

show abstract

Efficient Clustering of Very Large Document Collections

Dhillon¹,

Fan²,

Guan³

2001

150

View full text Add to dashboard Cite

An invaluable portio~of scientific data occurs naturally in text form.Given a large unlabeled document collection, it is often helpful to organize this collection into clusters of related documents. By using a vector space model, text data can be treated as high-dimensional but sparse numerical data vectors. It is a contemporary challenge to efficiently preprocess and cluster very large document collections. In this paper we present a time and memory efficient technique for the entire clustering process, including the creation of the vector space model. This efficiency is obtained by (i) a memory-efficient multi-threaded preprocessing scheme, and (ii) a fast clustering algorithm that fully exploits the sparsity of the data set. We show that this entire process takes time that is linear in the size of the document collection. Detailed experimental results are presented -a highlight of our results is that we are able to effectively cluster a collection of 113,716 NSF award abstracts in 23 minutes (including disk I/O costs) on a single workstation with modest memory consumption.

show abstract

Question analysis: How Watson reads a clue

et al. 2012

View full text Add to dashboard Cite

The first stage of processing in the IBM Watsoni system is to perform a detailed analysis of the question in order to determine what it is asking for and how best to approach answering it. Question analysis uses Watson's parsing and semantic analysis capabilities: a deep Slot Grammar parser, a named entity recognizer, a co-reference resolution component, and a relation extraction component. We apply numerous detection rules and classifiers using features from this analysis to detect critical elements of the question, including: 1) the part of the question that is a reference to the answer (the focus); 2) terms in the question that indicate what type of entity is being asked for (lexical answer types); 3) a classification of the question into one or more of several broad types; and 4) elements of the question that play particular roles that may require special handling, for example, nested subquestions that must be separately answered. We describe how these elements are detected and evaluate the impact of accurate detection on our end-to-end question-answering system accuracy.

show abstract

Fluorescence Excitation Emission Matrices of Human Tissue: A System for in vivo Measurement and Method of Data Analysis

et al. 1999

View full text Add to dashboard Cite

We describ e a system capable of measuring spatially resolved reectance spectra from 380 to 950 nm and¯uorescence excitation emission m atrices from 330 to 500 nm excitation and 380 to 700 nm emission in vivo. System performance was compared to that of a standard scanning spectro¯uorimeter. This``FastEEM``system was used to interrogate human normal and neoplastic oral cavity mucosa in vivo. Measurem ents were m ade through a ® ber-optic probe and req uire 4 min total m easurement tim e. We present a method based on autocorrelation vectors to identify excitation and em ission wavelengths where the spectra of norm al and pathologic tissues differ most. The FastEEM system provides a tool with which to study the relative diagnostic ability of changes in absorption, scattering, and¯uorescence properties of tissue.

show abstract

Automatic knowledge extraction from documents

et al. 2012

View full text Add to dashboard Cite

Access to a large amount of knowledge is critical for success at answering open-domain questions for DeepQA systems such as IBM Watsoni. Formal representation of knowledge has the advantage of being easy to reason with, but acquisition of structured knowledge in open domains from unstructured data is often difficult and expensive. Our central hypothesis is that shallow syntactic knowledge and its implied semantics can be easily acquired and can be used in many areas of a question-answering system. We take a two-stage approach to extract the syntactic knowledge and implied semantics. First, shallow knowledge from large collections of documents is automatically extracted. Second, additional semantics are inferred from aggregate statistics of the automatically extracted shallow knowledge. In this paper, we describe in detail what kind of shallow knowledge is extracted, how it is automatically done from a large corpus, and how additional semantics are inferred from aggregate statistics. We also briefly discuss the various ways extracted knowledge is used throughout the IBM DeepQA system.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jialu Fan

Building Watson: An Overview of the DeepQA Project

Efficient Clustering of Very Large Document Collections

Question analysis: How Watson reads a clue

Fluorescence Excitation Emission Matrices of Human Tissue: A System for in vivo Measurement and Method of Data Analysis

Automatic knowledge extraction from documents

Contact Info

Product

Resources

About