This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at
Examines research in cognitive psychology, which has in the past paid little attention to the olfactory modality. But there is now a significant body of literature on the role of the olfactory system in memory and cognition. Human beings possess an excellent ability to detect and discriminate odors, but they typically have great difficulty in identifying particular odorants. This results partly from the use of an improverished and idiosyncratic language to describe olfactory experiences, which are normally encoded either in a rudimentary sensory form or as part of a complex but highly specific biographical episode. Consequently, linguistic processes play only a very limited role in olfactory processing, whereas hedonic factors seem to be of considerable importance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.