This paper describes our approach to the SemEval-2017 "Semantic Textual Similarity" and "Multilingual Word Similarity" tasks. In the former, we test our approach in both English and Spanish, and use a linguistically-rich set of features. These move from lexical to semantic features. In particular, we try to take advantage of the recent Abstract Meaning Representation and SMATCH measure. Although without state of the art results, we introduce semantic structures in textual similarity and analyze their impact. Regarding word similarity, we target the English language and combine WordNet information with Word Embeddings. Without matching the best systems, our approach proved to be simple and effective.
In the last decades, several research areas experienced key improvements due to the appearance of numerous tools made available to the scientific community. For instance, Moses plays an important role in recent developments in machine translation and Lucene is, with no doubt, a widespread tool in information retrieval. The existence of these systems allows an easy development of baselines and, therefore, researchers can focus on improving preliminary results, instead of spending time in developing software from scratch. In addition, the existence of appropriate test collections leads to a straightforward comparison of systems and of their specific components. In this paper we describe Just.Ask, a multi-pronged approach to open-domain question answering. Just.Ask combines rule- with machine learning-based components and implements several state-of-the-art strategies in question answering. Also, it has a flexible architecture that allows for further extensions. Moreover, in this paper we report a detailed evaluation of each one of Just.Ask components. The evaluation is split into two parts: in the first one, we use a manually built test collection — the GoldWebQA — that intends to evaluate Just.Ask performance when the information source in use is the Web, without having to deal with its constant changes; in the second one, we use a set of questions gathered from the TREC evaluation forum, having a closed text collection, locally indexed and stored, as information source. Therefore, this paper contributes with a benchmark for research on question answering, since both Just.Ask and the GoldWebQA corpus are freely available for the scientific community.
We introduce QGASP, a system that performs question generation by using lexical, syntactic and semantic information. QGASP uses this information both to learn patterns and to generate questions. In this paper, we briefly describe its architecture.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.