The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian community of computational linguistics. We try to overcome this gap by introducing the project ParaPhraser.ru dedicated to the collection of Russian paraphrase corpus and organizing a Paraphrase Detection Shared Task, which uses the corpus as the training data. The participants of the task applied a wide variety of techniques to the problem of paraphrase detection, from rule-based approaches to deep learning, and results of the task reflect the following tendencies: the best scores are obtained by the strategy of using traditional classifiers combined with finegrained linguistic features, however, complex neural networks, shallow methods and purely technical methods also demonstrate competitive results.
Abstract. In this paper information extraction method for the restaurant recommendation system is proposed. We aim at the development of an information extraction (IE) system which is intended to be a module of the recommendation system. The IE system is to gather information about different aspects of restaurants from online reviews, structure it and feed the recommendation module with the obtained data. The analyzed frames include service and food quality, cuisine, price level, noise level, etc. In this paper service quality, cuisine type and food quality are considered. As part of corpus preprocessing phase, a method for Russian reviews corpus analysis (as part of information extraction) is proposed. Its importance is shown at the experimental phase, when the application of machine learning techniques to aspects extraction is analyzed. It is shown that the ideas obtained at the corpus preprocessing stage can help to improve machine learning models performance.Keywords: corpus analysis, restaurant reviews, information extraction, recommendation system, machine learning. IntroductionIn this paper information extraction (IE) method for the Russian restaurant recommendation system is proposed. It is based on the application of linguistic information gathered from corpus analysis and can be used for similar domains and underresourced languages. Our information extraction framework is a part of the project which aims at implementing restaurants recommendation system, and in this paper we consider two tasks: reviews corpus analysis and the application of machine learning techniques to the problem in question. During the latter task we use the information obtained at the corpus analysis phase. Our approach includes opinion mining since restaurant characteristics are both objective and subjective. Our corpus analysis method is based on non-contiguous bigrams and part of speech (POS) distribution analysis. Trigger words dictionaries are learnt using the bootstrapping method. E. Pronoza et al.The frames to be extracted include service quality, food quality, cuisine type, price level, noise level, etc. Each frame has its own set of aspects. We suppose that the most important characteristics of a restaurant are service and food quality and cuisine type and therefore we only consider these three frames and focus on the extraction of their aspects. Such an assumption is proved by the distribution of the aspects in the data.We also suppose that the proposed IE system can be highly effective despite the difficulties imposed by the structure of a typical Russian restaurant review. Although the key information about restaurant characteristics does not always lie on the surface, tuning machine learning models according to the results of corpus analysis can help to improve the performance of an IE system. Related WorkInformation extraction (IE) task as part of recommendation system development is discussed in [21]. The authors propose a rule-based approach to the extraction of key words from user's email. These keywords are put in...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.