Text segmentation plays an important role in various Natural Language Processing (NLP) tasks like summarization, context understanding, document indexing and document noise removal. Previous methods for this task require manual feature engineering, huge memory requirements and large execution times. To the best of our knowledge, this paper is the first one to present a novel supervised neural approach for text segmentation. Specifically, we propose an attention-based bidirectional LSTM model where sentence embeddings are learned using CNNs and the segments are predicted based on contextual information. This model can automatically handle variable sized context information. Compared to the existing competitive baselines, the proposed model shows a performance improvement of ∼7% in WinDiff score on three benchmark datasets.
Extractive summarization techniques typically aim to maximize the information coverage of the summary with respect to the original corpus and report accuracies in ROUGE scores. Automated text summarization techniques should consider the dimensions of comprehensibility, coherence and readability. In the current work, we identify the discourse structure which provides the context for the creation of a sentence. We leverage the information from the structure to frame a monotone (non-decreasing) sub-modular scoring function for generating comprehensible summaries. Our approach improves the overall quality of comprehensibility of the summary in terms of human evaluation and gives sufficient content coverage with comparable ROUGE score. We also formulate a metric to measure summary comprehensibility in terms of Contextual Independence of a sentence. The metric is shown to be representative of human judgement of text comprehensibility.
Neural models for text generation are often designed in an end-to-end fashion, typically with zero control over intermediate computations, limiting their practical usability in downstream applications. In this work, we incorporate explicit means into neural models to ensure topical continuity, content comprehensiveness and informativeness of automatically generated radiology reports. We propose a method to compute image representations specific to each sentential context to minimize hallucination caused by sequence-to-sequence approaches and to further eliminate redundant content by exploiting diverse sentence states. We conduct experiments to generate radiology reports from medical images of chest x-rays using MIMIC-CXR. Our model outperforms baselines by up to 18% and 29% respective in the evaluation for informativeness and content ordering respectively on objective metrics and 16% on human validations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.