Emilia Zawadzka-Gosk scite author profile

Evaluating patients’ experience and satisfaction often calls for analyses of free-text data. Language and domain-specific information extraction can reduce costly manual preprocessing and enable the analysis of extensive collections of experience-based narratives. The research aims were to (1) elicit free-text narratives about experiences with health services of international students in Poland, (2) develop domain- and language-specific algorithms for the extraction of information relevant for the evaluation of quality and safety of health services, and (3) test the performance of information extraction algorithms’ on questions about the patients’ experiences with health services. The materials were free-text narratives about health clinic encounters produced by English-speaking foreigners recalling their experiences (n = 104) in healthcare facilities in Poland. A linguistic analysis of the text collection led to constructing a semantic–syntactic lexicon and a set of lexical-syntactic frames. These were further used to develop rule-based information extraction algorithms in the form of Python scripts. The extraction algorithms generated text classifications according to predefined queries. In addition, the narratives were classified by human readers. The algorithm-based and the human readers’ classifications were highly correlated and significant (p < 0.01), indicating an excellent performance of the automatic query algorithms. The study results demonstrate that domain-specific and language-specific information extraction from free-text narratives can be used as an efficient and low-cost method for evaluating patient experiences and satisfaction with health services and built into software solutions for the quality evaluation in health care.

show abstract

Deep Learning and Sub-Word-Unit Approach in Written Art Generation

Wołk

Zawadzka-Gosk

Czarnowski³

2019

View full text Add to dashboard Cite

Automatic poetry generation is novel and interesting application of natural language processing research. It became more popular during the last few years due to the rapid development of technology and neural computing power. This line of research can be applied to the study of linguistics and literature, for social science experiments, or simply for entertainment. The most effective known method of artificial poem generation uses recurrent neural networks (RNN). We also used RNNs to generate poems in the style of Adam Mickiewicz. Our network was trained on the 'Sir Thaddeus' poem. For data pre-processing, we used a specialized stemming tool, which is one of the major innovations and contributions of this work. Our experiment was conducted on the source text, divided into sub-word units (at a level of resolution close to syllables). This approach is novel and is not often employed in the published literature. The subwords units seem to be a natural choice for analysis of the Polish language, as the language is morphologically rich due to cases, gender forms and a large vocabulary. Moreover, 'Sir Thaddeus' contains rhymes, so the analysis of syllables can be meaningful. We verified our model with different settings for the temperature parameter, which controls the randomness of the generated text. We also compared our results with similar models trained on the same text but divided into characters (which is the most common approach alongside the use of full word units). The differences were tremendous. Our solution generated much better poems that were able to follow the metre and vocabulary of the source data text.

show abstract

Deep learning and sub-word-unit approach in written art generation

Wołk¹,

Zawadzka-Gosk²,

Czarnowski³

2019

Preprint

View full text Add to dashboard Cite

Semantic-Enabled Hybrid Genetic Disease Diagnostics in Next-Generation Sequenced Data

Zawadzka-Gosk

Wołk

2018

csci

View full text Add to dashboard Cite

Next Generation Sequencing is a technology for genome sequencing used in genetics for the diagnosis of disease. NGS provides a list of all mutations in a genome, so identifying the one that causes a disease is not trivial. A number of applications for variant prioritization were developed, but the data they provide is a suggestion rather than a diagnosis; moreover, they suffer from issues such as identifying a nonpathogenic variant as a causal one or the inability to identify a causal gene. These issues inspired us to create a strategy for variant prioritization, which includes the use of the Exomiser and OMIM Explorer result sets improved by semantic analysis of abstracts and articles freely available from the PubMed and PubMed Central databases. For the wider scope of scientific articles, the Google Scholar repository will be used. The described approach enables us to present the latest and most accurate information about potential pathogenic variants.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.