We propose an unsupervised method for distinguishing literal and non-literal usages of idiomatic expressions.Our method determines how well a literal interpretation is linked to the overall cohesive structure of the discourse. If strong links can be found, the expression is classified as literal, otherwise as idiomatic. We show that this method can help to tell apart literal and non-literal usages, even for idioms which occur in canonical form.
Being able to identify which rhetorical relations (e.g., contrast or explanation) hold between spans of text is important for many natural language processing applications. Using machine learning to obtain a classifier which can distinguish between different relations typically depends on the availability of manually labelled training data, which is very time-consuming to create. However, rhetorical relations are sometimes lexically marked, i.e., signalled by discourse markers (e.g., because, but, consequently etc.), and it has been suggested (Marcu and Echihabi, 2002) that the presence of these cues in some examples can be exploited to label them automatically with the corresponding relation. The discourse markers are then removed and the automatically labelled data are used to train a classifier to determine relations even when no discourse marker is present (based on other linguistic cues such as word co-occurrences). In this paper, we investigate empirically how feasible this approach is. In particular, we test whether automatically labelled, lexically marked examples are really suitable training material for classifiers that are then applied to unmarked examples. Our results suggest that training on this type of data may not be such a good strategy, as models trained in this way do not seem to generalise very well to unmarked data. Furthermore, we found some evidence that this behaviour is largely independent of the classifiers used and seems to lie in the data itself (e.g., marked and unmarked examples may be too dissimilar linguistically and removing unambiguous markers in the automatic labelling process may lead to a meaning shift in the examples).
Traditionally, most research in NLP has focused on propositional aspects of meaning. However, to truly understand language, extra-propositional aspects are equally important. Modality and negation typically contribute a lot to these extra-propositional meaning aspects. While modality and negation have often been neglected by mainstream computational linguistics, interest has grown in recent years, as evidenced by several annotation projects dedicated to these phenomena. Researchers have started to work on modelling factuality, belief and certainty, detecting speculative sentences and hedging, identifying contradictions, and determining the scope of expressions of modality and negation. In this article, we will provide an overview of how modality and negation have been modelled in computational linguistics. Volume ?, Number ? tational treatment of propositional aspects of meaning, like semantic role labeling, and a response to the need for processing extra-propositional aspects of meaning as a further step towards text understanding. That there is more to meaning than just propositional content is a long-held view. Prabhakaran et al. (2010) illustrate this statement with the following examples, where the event LAY_OFF(GM, workers) is presented with different extra-propositional meanings:(1) a. GM will lay off workers. b. A spokesman for GM said GM will lay off workers.c. GM may lay off workers.d. The politician claimed that GM will lay off workers. e. Some wish GM would lay of workers.f. Will GM lay off workers? g. Many wonder whether GM will lay off workers.Generally speaking, modality is a grammatical category that allows the expression of aspects related to the attitude of the speaker towards her statements in terms of degree of certainty, reliability, subjectivity, sources of information, and perspective. We understand modality in a broad sense, which involves related concepts like 'subjectivity', 'hedging', 'evidentiality', 'uncertainty', 'committed belief' and 'factuality'. Negation is a grammatical category that allows the changing of the truth value of a proposition. A more detailed definition of these concepts with examples will be presented in Sections 2 and 3. Modality and negation are challenging phenomena not only from a theoretical perspective, but also from a computational point of view. So far two main tasks have been addressed in the computational linguistics community: (i) the detection of various 2Computational Linguistics Just Accepted MS. Morante and SporlederModality and Negation forms of negation and modality and (ii) the resolution of the scope of modality and negation cues. While modality and negation tend to be lexically marked, the class of markers is heterogeneous, especially in the case of modality. Determining whether a sentence is speculative or whether it contains negated concepts cannot be achieved by simple lexical look-up of words potentially indicating modality or negation. Modal verbs like might are prototypical modality markers, but they can be used in multiple senses. Multiwo...
This article considers the problem of automatic paragraph segmentation. The task is relevant for speech-to-text applications whose output transcipts do not usually contain punctuation or paragraph indentation and are naturally difficult to read and process. Text-to-text generation applications (e.g., summarization) could also benefit from an automatic paragaraph segementation mechanism which indicates topic shifts and provides visual targets to the reader. We present a paragraph segmentation model which exploits a variety of knowledge sources (including textual cues, syntactic and discourse-related information) and evaluate its performance in different languages and domains. Our experiments demonstrate that the proposed approach significantly outperforms our baselines and in many cases comes to within a few percent of human performance. Finally, we integrate our method with a single document summarizer and show that it is useful for structuring the output of automatically generated text.
To date, document clustering by genres or authors has been performed mostly by means of stylometric and content features. With the premise that novels are societies in miniature, we build social networks from novels as a strategy to quantify their plot and structure. From each social network, we extract a vector of features which characterizes the novel. We perform clustering over the vectors obtained, and the resulting groups are contrasted in terms of author and genre.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.