In this article, we will reconsider the notion of a word as the basic unit of analysis in language and propose that in an information and meaning carrying system the unit of analysis should be a unit of meaning (UM). Such a UM may consist of one or more words. A method will be promoted that attempts to automatically retrieve UMs from corpora. To illustrate the results that may be obtained by this method, the node word ‘stroke’ will be used in a small study. The results will be discussed, with implications considered for both monolingual and multilingual use. The monolingual study will benefit from using the British National Corpus, while the multilingual study introduces a parallel corpus consisting of Swedish novels and their translations into English.
More and more researchers have recognized the potential value of the parallel corpus in the research on Machine Translation and Machine Aided Translation. This paper examines how Chinese English translation units could be extracted from parallel corpus. An iterative algorithm based on degree of word association is proposed to identify the multiword units for Chinese and English. Then the Chinese-English Translation Equivalent Pairs.are extracted from the parallel corpus. We also made comparison between different statistical association measurement in this paper.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.