Event detection and recognition is a complex task consisting of multiple sub-tasks of varying difficulty. In this paper, we present a simple, modular approach to event extraction that allows us to experiment with a variety of machine learning methods for these sub-tasks, as well as to evaluate the impact on performance these sub-tasks have on the overall task.
Many open domain question answering systems answer questions by first harvesting a large number of candidate answers, and then picking the most promising one from the list. One criterion for this answer selection is type checking: deciding whether the candidate answer is of the semantic type expected by the question. We define a general strategy for building redundancy-based type checkers, built around the notions of comparison set and scoring method, where the former provide a set of potential answer types and the latter are meant to capture the relation between a candidate answer and an answer type. Our focus is on scoring methods. We discuss nine such methods, provide a detailed experimental comparison and analysis of these methods, and find that the best performing scoring method performs at the same level as knowledge-intensive methods, although our experiments do not reveal a clear-cut answer on the question whether any of the scoring methods we consider should be preferred over the others.
1In this paper, we describe how the LIDAS System (Linguistic Discourse Analysis System), a discourse parser built as an implementation of the Unified Linguistic Discourse Model (U-LDM) uses information from sentential syntax and semantics along with lexical semantic information to build the Open Right Discourse Parse Tree (DPT) that serves as a representation of the structure of the discourse (Polanyi et al., 2004; Thione 2004a,b). More specifically, we discuss how discourse segmentation, sentence-level discourse parsing, and text-level discourse parsing depend on the relationship between sentential syntax and discourse. Specific discourse rules that use syntactic information are used to identify possible attachment points and attachment relations for each Basic Discourse Unit to the DPT.
We present TweetMotif, an exploratory search applica- tion for Twitter. Unlike traditional approaches to in- formation retrieval, which present a simple list of mes- sages, TweetMotif groups messages by frequent signif- icant terms — a result set’s subtopics — which facili- tate navigation and drilldown through a faceted search interface. The topic extraction system is based on syn- tactic filtering, language modeling, near-duplicate de- tection, and set cover heuristics. We have used Tweet- Motif to deflate rumors, uncover scams, summarize sentiment, and track political protests in real-time. A demo of TweetMotif, plus its source code, is available at http://tweetmotif.com.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.