We present an approach to the automatic creation of extractive summaries of literary short stories. The summaries are produced with a specific objective in mind: to help a reader decide whether she would be interested in reading the complete story. To this end, the summaries give the user relevant information about the setting of the story without revealing its plot. The system relies on assorted surface indicators about clauses in the short story, the most important of which are those related to the aspectual type of a clause and to the main entities in a story. Fifteen judges evaluated the summaries on a number of extrinsic and intrinsic measures. The outcome of this evaluation suggests that the summaries are helpful in achieving the original objective.
This paper describes the first, three-year phase of a project at the National Research Council of Canada that creates software to assist Indigenous communities in preserving their languages and extending their use. The project aimed to work within the empowerment paradigm, where collaboration with communities and fulfillment of their goals is central. Since many of the technologies we developed were in response to community needs, the project ended up as a collection of diverse subprojects, including the creation of a sophisticated framework for building verb conjugators for highly inflectional polysynthetic languages (such as Kanyen'kéha, in the Iroquoian language family), release of what is probably the largest available corpus of sentences in a polysynthetic language (Inuktut) aligned with English sentences and experiments with machine translation (MT) systems trained on this corpus, free online services based on automatic speech recognition (ASR) for easing the transcription bottleneck for speech recordings, software for implementing text prediction and read-along audiobooks for Indigenous languages, and several other subprojects. Sociolinguistic BackgroundThere are about 70 Indigenous languages from 10 distinct language families currently spoken in Canada (Rice, 2008). Most of these languages have complex morphology; they are polysynthetic or agglutinative. Commonly, a single word carries the meaning of an entire clause in Indo-European languages.
This paper describes a system that produces extractive summaries of short works of literary fiction. The ultimate purpose of produced summaries is defined as helping a reader to determine whether she would be interested in reading a particular story. To this end, the summary aims to provide a reader with an idea about the settings of a story (such as characters, time and place) without revealing the plot. The approach presented here relies heavily on the notion of aspect. Preliminary results show an improvement over two naïve baselines: a lead baseline and a more sophisticated variant of it. Although modest, the results suggest that using aspectual information may be of help when summarizing fiction. A more thorough evaluation involving human judges is under way.
We describe the statistical machine translation system developed at the National Research Council of Canada (NRC) for the Russian-English news translation task of the First Conference on Machine Translation (WMT 2016). Our submission is a phrase-based SMT system that tackles the morphological complexity of Russian through comprehensive use of lemmatization. The core of our lemmatization strategy is to use different views of Russian for different SMT components: word alignment and bilingual neural network language models use lemmas, while sparse features and reordering models use fully inflected forms. Some components, such as the phrase table, use both views of the source. Russian words that remain out-ofvocabulary (OOV) after lemmatization are transliterated into English using a statistical model trained on examples mined from the parallel training corpus. The NRC Russian-English MT system achieved the highest uncased BLEU and the lowest TER scores among the eight participants in WMT 2016.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.