While most text classification studies focus on monolingual documents, in this article, we propose an empirical study of poly-languages text sentiment classification model, based on Convolutional Networks ConvNets. The novel approach consists on feeding the deep neural network with one input text source composed by reviews all written in different languages, without any code-switching indication, or language translation. We construct a multi-lingual opinion corpus combining three languages: English French and Greek all from Restaurants Reviews. Despite the limited contextual information due to relatively compact text content, no prior knowledge is used. The neural networks exploit n-gram level information, and the experimental results achieve high accuracy for sentiment polarity prediction, both positive and negative, which lead us to deduce that ConvNets features extraction is language independent.
In this article we introduce an empirical study of multilingual and multi-topic opinion classification. The particularity relies on the reviews that are written in different languages and refer to different but semantically close topics: Restaurants and Hotels. Our key objective is to emphasize the ability of a deep learning model to establish the sentiment polarity of reviews and topics Classification in a multilingual environment without any prior knowledge. For this work, we use unstructured text data, collected from the web, written in French, English and Greek (a less opinion-present language). The incorporate corpusbased input is raw, used without any pre-processing, translation, annotation nor additional knowledge features. For the machine learning approach, we use two different deep neural networks, Convolutional Neural Networks (CONVNETS) and Recurrent Neural Networks (RNNS). The learning model exploits n-gram level information, and achieves high accuracy for sentiment polarity and topics classification according to the experimental tests and results. From our hypothesis, we argue that the multilingual environment composed of reviews in semantically close domains, does not impact the network performance, and lead us to deduce that semantic features extraction with ConvNets and RNNs are language and context independent. Following these results, we tend to promote the inception of simple yet powerful approach for feeding deep networks in multilingual context.
We present a method of automaticaly extracting and gathering specific data text from web pages, creating a thematic corpus of reviews for opinion mining and sentiment analysis. The internet is an immense source of machine-readable texts [11] suitable for linguistic corpus studies [3][1]. Though, specific tools of web information extraction research domain as well as from the NLP do not include an open source system able to provide a thematic corpus according to an end-user request [16]. The need of use natural texts as databank for opinion mining and sentiment analysis is increased since the expansion of the digital interaction between users and blogs, forums and social networks. The RevScrap system is designed to provide an intuitive, easy-to-use interface able to extract specific information from accurate web pages returned by search engine's request and create a corpus composed by comments, reviews, opinions, as expressed by users' experience and feedback. The corpus is well structured in xml documents, reflected Singler's design criteria [4].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.