This paper presents a predicate-argument structure analysis that simultaneously conducts zero-anaphora resolution. By adding noun phrases as candidate arguments that are not only in the sentence of the target predicate but also outside of the sentence, our analyzer identifies arguments regardless of whether they appear in the sentence or not. Because we adopt discriminative models based on maximum entropy for argument identification, we can easily add new features. We add language model scores as well as contextual features. We also use contextual information to restrict candidate arguments.
In this paper, we present a novel global reordering model that can be incorporated into standard phrase-based statistical machine translation. Unlike previous local reordering models that emphasize the reordering of adjacent phrase pairs (Tillmann and Zhang, 2005), our model explicitly models the reordering of long distances by directly estimating the parameters from the phrase alignments of bilingual training sentences. In principle, the global phrase reordering model is conditioned on the source and target phrases that are currently being translated, and the previously translated source and target phrases. To cope with sparseness, we use N-best phrase alignments and bilingual phrase clustering, and investigate a variety of combinations of conditioning factors. Through experiments, we show, that the global reordering model significantly improves the translation accuracy of a standard Japanese-English translation task.
We present a portable translator that recognizes and translates phrases on signboards and menus as captured by a builtin camera. This system can be used on PDAs or mobile phones and resolves the difficulty of inputting some character sets such as Japanese and Chinese if the user doesn't know their readings. Through the high speed mobile network, small images of signboards can be quickly sent to the recognition and translation server. Since the server runs state of the art recognition and translation technology and huge dictionaries, the proposed system offers more accurate character recognition and machine translation.
This paper proposes three modules based on latent topics of documents for alleviating "semantic drift" in bootstrapping entity set expansion. These new modules are added to a discriminative bootstrapping algorithm to realize topic feature generation, negative example selection and positive example disambiguation. In this study, we model latent topics with LDA (Latent Dirichlet Allocation) in an unsupervised way. Experiments show that the accuracy of the extracted entities is improved by 6.7 to 28.2% depending on the domain.
We introduce a multi-language named-entity recognition system based on HMM. Japanese, Chinese, Korean and English versions have already been implemented. In principle, it can analyze any other language if we have training data of the target language. This system has a common analytical engine and it can handle any language simply by changing the lexical analysis rules and statistical language model. In this paper, we describe the architecture and accuracy of the named-entity system, and report preliminary experiments on automatic bilingual named-entity dictionary construction using the Japanese and English named-entity recognizer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.