SUMMARYApplying existing keyword retrieval to image retrieval causes some problems. Among them, database developers must describe photograph contents in detail and adequate retrieval is difficult. These problems have been resolved by developing an associative retrieval technology using not keywords but semantic vectors as a retrieval method with an associative function such as a human being possesses. A semantic vector dictionaryof more than 100, OOO words was made from encyclopedia text. This paper explains the experimental image retrieval system for 36,000 photographs with the semantic vector dictionary made from the encyclopedia text. This system associates the words input by a user with the knowledge in the encyclopedia text and outputs ranked retrieval results. The effectiveness of this associative retrieval method is confirmed by evaluating content retrieval of images by a small benchmark and making an adaptive learning function of semantic vectors in the case in which a retrieval result is not the same as a user's subjective perspective would impose.
SUMMARYThe problem with distributed representations generated by neural networks is that the meaning of the features is difficult to understand. We propose a new method that gives a specific meaning to each node of a hidden layer by introducing a manually created word semantic vector dictionary into the initial weights and by using paragraph vector models. We conducted experiments to test the hypotheses using a single domain benchmark for Japanese Twitter sentiment analysis and then evaluated the expandability of the method using a diverse and large-scale benchmark. Moreover, we tested the domain-independence of the method using a Wikipedia corpus. Our experimental results demonstrated that the learned vector is better than the performance of the existing paragraph vector in the evaluation of the Twitter sentiment analysis task using the single domain benchmark. Also, we determined the readability of document embeddings, which means distributed representations of documents, in a user test. The definition of readability in this paper is that people can understand the meaning of large weighted features of distributed representations. A total of 52.4% of the top five weighted hidden nodes were related to tweets where one of the paragraph vector models learned the document embeddings. For the expandability evaluation of the method, we improved the dictionary based on the results of the hypothesis test and examined the relationship of the readability of learned word vectors and the task accuracy of Twitter sentiment analysis using the diverse and large-scale benchmark. We also conducted a word similarity task using the Wikipedia corpus to test the domain-independence of the method. We found the expandability results of the method are better than or comparable to the performance of the paragraph vector. Also, the objective and subjective evaluation support each hidden node maintaining a specific meaning. Thus, the proposed method succeeded in improving readability. key words: distributed representation, word semantic vector dictionary, paragraph vector, word2vec, Twitter, sentiment analysis
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.