Idioms are multi-word expressions whose meaning cannot always be deduced from the literal meaning of constituent words. A key feature of idioms that is central to this paper is their peculiar mixture of fixedness and variability, which poses challenges for their retrieval from large corpora using traditional search approaches. These challenges hinder insights into idiom usage, affecting users who are conducting linguistic research as well as those involved in language education. To facilitate access to idiom examples taken from real-world contexts, we introduce an information retrieval system designed specifically for idioms. Given a search query that represents an idiom, typically in its canonical form, the system expands it automatically to account for the most common types of idiom variation including inflection, open slots, adjectival or adverbial modification and passivisation. As a by-product of query expansion, other types of idiom variation captured include derivation, compounding, negation, distribution across multiple clauses as well as other unforeseen types of variation. The system was implemented on top of Elasticsearch, an open-source, distributed, scalable, real-time search engine. Flexible retrieval of idioms is supported by a combination of linguistic pre-processing of the search queries, their translation into a set of query clauses written in a query language called Query DSL, and analysis, an indexing process that involves tokenisation and normalisation. Our system outperformed the phrase search in terms of recall and outperformed the keyword search in terms of precision. Out of the three, our approach was found to provide the best balance between precision and recall. By providing a fast and easy way of finding idioms in large corpora, our approach can facilitate further developments in fields such as linguistics, language education and natural language processing.
The development of CCS projects in the UK will be relatively expensive and times are hard. The British Government continues to maintain a supportive policy towards CCS but the Treasury's ever tightening purse strings mean that the recently hoped for levels of funding for CCS looks unlikely to materialise, at least in the form hoped for. Should CCS receive more funding? It is often hard to argue that it should and one significant reason for this is that CCS, in common with many of the technologies in the combined energy and environment space, suffers from a prevalence of confusing, and often erroneous, assertions regarding the necessity, or otherwise, of its development. These assertions come from many different sources and a fair amount of them seem to be made for partisan reasons which can lead to the presentation of intentionally onesided arguments. The green lobby, for example, for a long time misleadingly represented CCS as being associated uniquely with coal fired power; this led to many environmentalists adopting a mantra that, as coal fired power is bad, CCS is bad. The coal industry, understandably, is a staunch supporter of CCS, but one suspects not for environmentally motivated reasons. Politicians, on the left and the right, seem often to have advocated, or vilified, CCS motivated more by lingering alliances and grievances associated with the UK coal industry than an informed analysis of the utility of the technology. CCS has been vaunted at times as a source of exportable intellectual property and skills that would be of benefit to the economy rather than as a green technology; this led to policy mistakes that were a fundamental factor in the failure of the UK Government's first attempt to procure a CCS demonstration project. Currently, the potential for the use of carbon dioxide (CO 2) injection as a method to increase the productivity from hydrocarbon reservoirs is also leading to the use of much hyperbole both in favour and against the development of CCS, much of it misleading. Too often allegiances and ulterior motives seem to lead to unhelpful and apparently intentionally disingenuous rhetoric being aimed at CCS as a technology: it doesn't work, is exorbitantly expensive, damaging to the economy, ridiculously hazardous or environmentally harmful is the message from one side; whilst a confident assertion that it is proven technology, costs less than renewables, is safe, a potential boon for exports and the only practical saviour of the planet in an increasingly fossil fuel hungry world, rings out from the other. The confusing noise that is caused by the grinding of these various axes is both unfortunate, because it increases the probability that poor policy choices will be made, and unnecessary, because the case for CCS is actually extremely clear and simple. It runs as follows. Our elected leaders are advised, and have been convinced, that emissions of anthropologically produced greenhouse gases, the most significant of which is carbon dioxide, is leading to a potentially extremely damaging rise ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.