This paper reports on the pilot question answering track that was carried out within the CLEF initiative this year. The track was divided into monolingual and bilingual tasks: monolingual systems were evaluated within the frame of three non-English European languages, Dutch, Italian and Spanish, while in the crosslanguage tasks an English document collection constituted the target corpus for Italian, Spanish, Dutch, French and German queries. Participants were given 200 questions for each task, and were allowed to submit up to two runs per task with up to three responses (either exact answers or 50 bytes long strings) per question. We give here an overview of the track: we report on each task and discuss the creation of the multilingual test sets and the participants' results.
The Recognizing Textual Entailment System shown here is based on the use of a broad-coverage parser to extract dependency relationships; in addition, WordNet relations are used to recognize entailment at the lexical level. The work investigates whether the mapping of dependency trees from text and hypothesis give better evidence of entailment than the matching of plain text alone. While the use of WordNet seems to improve system's performance, the notion of mapping between trees here explored (inclusion) shows no improvement, suggesting that other notions of tree mappings should be explored such as tree edit distances or tree alignment distances.
The general aim of the third CLEF Multilingual Question Answering Track was to set up a common and replicable evaluation framework to test both monolingual and cross-language Question Answering (QA) systems that process queries and documents in several European languages. Nine target languages and ten source languages were exploited to enact 8 monolingual and 73 cross-language tasks. Twenty-four groups participated in the exercise.Overall results showed a general increase in performance in comparison to last year. The best performing monolingual system irrespective of target language answered 64.5% of the questions correctly (in the monolingual Portuguese task), while the average of the best performances for each target language was 42.6%. The cross-language step instead entailed a considerable drop in performance. In addition to accuracy, the organisers also measured the relation between the correctness of an answer and a system's stated confidence in it, showing that the best systems did not always provide the most reliable confidence score.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.