This paper proposes a system that can detect and rephrase profanity in Chinese text. Rather than just masking detected profanity, we want to revise the input sentence by using inoffensive words while keeping their original meanings. 29 of such rephrasing rules were invented after observing sentences on real-word social websites. The overall accuracy of the proposed system is 85.56%
Huge multilingual news articles are reported and disseminated on the Internet. ltow to extract the kcy information and savc the reading time is a crucial issue. This paper proposes architecture of multilingual news sumlnarizer, including monolingual and multilingual clustering, similarity measure among lneaningful ullits, and presentation of summarization results.Translation anlong news stories, idiosyncrasy among languages, itnplicit information, and user preference are addressed.
This paper describes details of NTOU Chinese spelling check system in SIGHAN-8 Bakeoff. Besides the basic architecture of the previous system participating in last two CSC tasks, three new preference rules were proposed to deal with Simplified Chinese characters, variants, sentence-final particles, and DE-particles. A new sentence likelihood function was proposed based on frequencies of space-removed version of Google n-gram datasets. Two formal runs were submitted where the best one was created by the system using Google n-gram frequency information.
This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify the similar MUs, focusing and browsing models are applied to represent the summarization results. To reduce information loss during summarization, informative words in a document are introduced. For the evaluation, a question answering system (QA system) is proposed to substitute the human assessors. In large-scale experiments containing 140 questions to 17,877 documents, the results show that those models using informative words outperform pure heuristic voting-only strategy by news reporters. This model can be easily further applied to summarize multilingual news from multiple sources.
This paper describes the design of an ellipsis and coreference resolution module integrated in a computerized virtual patient dialogue system. Real medical diagnosis dialogues have been collected and analyzed. Several groups of diagnosis-related concepts were defined and used to construct rules, patterns, and features to detect and resolve ellipsis and coreference. The best F-scores of ellipsis detection and resolution were 89.15 % and 83.40 %, respectively. The best F-scores of phrasal coreference detection and resolution were 93.83 % and 83.40 %, respectively. The accuracy of pronominal anaphora resolution was 92 % for the 3rd-person singular pronouns referring to specific entities, and 97.31 % for other pronouns.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.