In this paper, we propose the first method for automatic Vietnamese medical term discovery and extraction from
clinical texts. The method combines linguistic filtering based on our defined open patterns with nested term extraction and
statistical ranking using C-value. It does not require annotated corpora, external data resources, parameter
settings, or term length restriction. Beside its specialty in handling Vietnamese medical terms, another novelty is that it uses
Pointwise Mutual Information to split nested terms and the disjunctive acceptance condition to extract them. Evaluated on real
Vietnamese electronic medical records, it achieves a precision of about 74% and recall of about 92% and is proved stably effective
with small datasets. It outperforms the previous works in the same category of not using annotated corpora and external data
resources. Our method and empirical evaluation analysis can lay a foundation for further research and development in Vietnamese
medical term discovery and extraction.
In this paper, we propose an RNN-Transducer model for recognizing Japanese and Chinese offline handwritten text line images. As far as we know, it is the first approach that adopts the RNN-Transducer model for offline handwritten text recognition. The proposed model consists of three main components: a visual feature encoder that extracts visual features from an input image by CNN and then encodes the visual features by BLSTM; a linguistic context encoder that extracts and encodes linguistic features from the input image by embedded layers and LSTM; and a joint decoder that combines and then decodes the visual features and the linguistic features into the final label sequence by fully connected and softmax layers. The proposed model takes advantage of both visual and linguistic information from the input image. In the experiments, we evaluated the performance of the proposed model on the two datasets: Kuzushiji and SCUT-EPT. Experimental results show that the proposed model achieves state-of-theart performance on all datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.