We describe SkillSum, a Natural Language Generation (NLG) system that generates a personalised feedback report for someone who has just completed a screening assessment of their basic literacy and numeracy skills. Because many SkillSum users have limited literacy, the generated reports must be easily comprehended by people with limited reading skills; this is the most novel aspect of SkillSum, and the focus of this paper. We used two approaches to maximise readability. First, for determining content and structure (document planning), we did not explicitly model readability, but rather followed a pragmatic approach of repeatedly revising content and structure following pilot experiments and interviews with domain experts. Second, for choosing linguistic expressions (microplanning), we attempted to formulate explicitly the choices that enhanced readability, using a constraints approach and preference rules; our constraints were based on corpus analysis and our preference rules were based on psycholinguistic findings. Evaluation of the SkillSum system was twofold: it compared the usefulness of NLG technology to that of canned text output, and it assessed the effectiveness of the readability model. Results showed that NLG was more effective than canned text at enhancing users' knowledge of their skills, and also suggested that the empirical 'revise based on experiments and interviews' approach made a substantial contribution to readability as well as our explicit psycholinguistically inspired models of readability choices.
BackgroundText definitions for entities within bio-ontologies are a cornerstone of the effort to gain a consensus in understanding and usage of those ontologies. Writing these definitions is, however, a considerable effort and there is often a lag between specification of the main part of an ontology (logical descriptions and definitions of entities) and the development of the text-based definitions. The goal of natural language generation (NLG) from ontologies is to take the logical description of entities and generate fluent natural language. The application described here uses NLG to automatically provide text-based definitions from an ontology that has logical descriptions of its entities, so avoiding the bottleneck of authoring these definitions by hand.ResultsTo produce the descriptions, the program collects all the axioms relating to a given entity, groups them according to common structure, realises each group through an English sentence, and assembles the resulting sentences into a paragraph, to form as ‘coherent’ a text as possible without human intervention. Sentence generation is accomplished using a generic grammar based on logical patterns in OWL, together with a lexicon for realising atomic entities. We have tested our output for the Experimental Factor Ontology (EFO) using a simple survey strategy to explore the fluency of the generated text and how well it conveys the underlying axiomatisation. Two rounds of survey and improvement show that overall the generated English definitions are found to convey the intended meaning of the axiomatisation in a satisfactory manner. The surveys also suggested that one form of generated English will not be universally liked; that intrusion of too much ‘formal ontology’ was not liked; and that too much explicit exposure of OWL semantics was also not liked.ConclusionsOur prototype tools can generate reasonable paragraphs of English text that can act as definitions. The definitions were found acceptable by our survey and, as a result, the developers of EFO are sufficiently satisfied with the output that the generated definitions have been incorporated into EFO. Whilst not a substitute for hand-written textual definitions, our generated definitions are a useful starting point.AvailabilityAn on-line version of the NLG text definition tool can be found at http://swat.open.ac.uk/tools/. The questionaire and sample generated text definitions may be found at http://mcs.open.ac.uk/nlg/SWAT/bio-ontologies.html.
Medical information is notoriously difficult to convey to patients because the content is complex, emotionally sensitive, and hard to explain without recourse to technical terms. We describe a pilot system for communicating the contents of electronic health records (EHRs) to patients. It generates two alternative presentations, which we have compared in a preliminary evaluation study: the first takes the form of a monologue, which elaborates the information taken from the patient's EHR by adding explanations of some concepts and procedures; the second takes the form of a scripted dialogue, in which the content is recast as a series of questions, answers and statements assigned to two characters in the dialogue, a senior and a junior nurse. Our discourse planning method designs these presentations in tandem, first producing a monologue plan which is then elaborated into a dialogue plan.
Abstract.Public information services and documents should be accessible to the widest possible readership. Information in newspapers often takes the form of numerical expressions which pose comprehension problems for people with limited education. A first possible approach to solve this important social problem is making numerical information accessible by rewriting difficult numerical expressions in a simpler way. To obtain guidelines for performing this task automatically, we have carried out a survey in which experts in numeracy were asked to simplify a range of proportion expressions, with three readerships in mind: (a) people who did not understand percentages; (b) people who did not understand decimals; (c) more generally, people with poor numeracy. Responses were consistent with our intuitions about how common values are considered simpler and how the value of the original expression influences the chosen simplification.
We describe a computational model for planning phrases like “more than a quarter” and “25.9 per cent” which describe proportions at different levels of precision. The model lays out the key choices in planning a numerical description, using formal definitions of mathematical form (e.g., the distinction between fractions and percentages) and roundness adapted from earlier studies. The task is modeled as a constraint satisfaction problem, with solutions subsequently ranked by preferences (e.g., for roundness). Detailed constraints are based on a corpus of numerical expressions collected in the NumGen project, 1 1 NumGen: Generating intelligent descriptions of numerical quantities for people with different levels of numeracy ( http://mcs.open.ac.uk/sw6629/numgen ). NumGen was funded by the Economic and Social Research Council under Grant Ref. RES-000-22-2760. and evaluated through empirical studies in which subjects were asked to produce (or complete) numerical expressions in specified contexts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.