<p>The socio-economic development of a country depends mainly on adequate integrated water resources management (IWRM). Sectors such as mining and agriculture are two main economic activities in Bolivia, that negatively impact the water resource quality and availability. Also, every year, floods and droughts hit the most vulnerable populations in different regions of Bolivia. Floods represent the greatest hydroclimatological risk factor in the country along with landslides caused by heavy precipitation. Along with these challenges in the country, there is also inefficient water treatment for water supply which can lead to other problems like diseases. Nowadays, the media such as newspapers, television, radio, report on these problems, in terms of water resources, which are experienced year after year in the country. Furthermore, due to advances in technology, this information can be found digitally. In the same way, people have made use of social networks, such as twitter, to express their opinion on a specific topic. The type of information found both in the media and in social networks is called qualitative information.</p><p>This digital information will be extracted using web crawling and web scrapping techniques that allow the process to be automated. This process is performed by applying keywords in the context of water resources in Bolivia, such as names of different water bodies in a basin. Once the information has been extracted, it will be transformed into a quantitative form, in such a way that it is useful for planning and decision-making processes of IWRM in Bolivia.</p><p>The purpose of this research is focused on the application of Natural Language Processing in the digital information found for three hydrological basins located in Bolivia, in order to recognize how Bolivian society relates the management of water resources. These hydrological basins are La Paz - Choqueyapu, Tupiza and Pampa &#8211; Huari. Initially, the digital information that will be studied in this research consists of three Bolivian newspapers and the information found on Twitter. The application of a sentiment analysis classification model in Python language programming is developed. In order to preserve the semantic information and the different words in the text, Word2Vec model will be used. The extracted digital information is pre-processed, eliminating empty words that do not add sentiments to a text and punctuation marks. Once the information is pre-processed, it is divided into two types, training and testing. The training data will be used to train the Word2Vec model. The result of the model consists of a value that determines the positive, neutral or negative sentiment of the text. Once the model is trained, the testing data that has not been used will be applied in order to evaluate the performance of the model.</p><p>This research helps to identify key elements, actors, frequent words related to IWRM, factors related to river health and improve the concept of citizen science. The results are mapped by geolocation, as a frequency distribution considering the digital perception (sentiment analysis) found and the frequency in which a topic is mentioned in the analysed digital information.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.