The traditional way to address the problem of sentiment classification is based on machine learning techniques; however, these models are not able to grasp all the richness of the text that comes from different social media, personal web pages, blogs, etc., ignoring the semantic of the text. Knowledge graphs give a way to extract structured knowledge from images and texts in order to facilitate their semantic analysis. This work proposes a new hybrid approach for Sentiment Analysis based on Knowledge Graphs and Deep Learning techniques to identify the sentiment polarity (positive or negative) in short documents, such as posts on Twitter. In this proposal, tweets are represented as graphs; then, graph similarity metrics and a Deep Learning classification algorithm are applied to produce sentiment predictions. This approach facilitates the traceability and interpretability of the classification results, thanks to the integration of the Local Interpretable Model-agnostic Explanations (LIME) model at the end of the pipeline. LIME allows raising trust in predictive models, since the model is not a black box anymore. Uncovering the black box allows understanding and interpreting how the network could distinguish between sentiment polarities. Each phase of the proposed approach conformed by pre-processing, graph construction, dimensionality reduction, graph similarity, sentiment prediction, and interpretability steps is described. The proposal is compared with character n-gram embeddings-based Deep Learning models to perform Sentiment Analysis. Results show that the proposal is able to outperforms classical n-gram models, with a recall up to 89% and F1-score of 88%.
El análisis de sentimientos ayuda a determinar la percepción de usuarios en diferentes aspectos de la vida cotidiana, como preferencias de productos en el mercado, nivel de confianza de los usuarios en ambientes de trabajo, o preferencias políticas. La idea es predecir tendencias o preferencias basados en sentimientos. En este artículo evaluamos las técnicas más comunes usadas para este tipo de análisis, considerando técnicas de aprendizaje de máquina y aprendizaje de máquina profundo. Nuestra contribución principal se basa en una propuesta de una estrategia metodológica que abarca las fases de preprocesamiento de datos, construcción de modelos predictivos y su evaluación. De los resultados, el mejor modelo clásico fue SVM, con 78% de precisión, y 79% de métrica F1 (F1 score). Para los modelos de Deep Learning, con mejores resultados fueron los modelos clásicos. El modelo con mejor desempeño fue el de Deep Learning Long Short Term Memory (LSTM), alcanzando un 88% de precisión y 89% de métrica F1. El peor de los modelos de Deep Learning fue el CNN, con 77% de precisión como de métrica F1. Concluyendo que, el algoritmo Long Short Term Memory (LSTM) demostró ser el mejor rendimiento, alcanzando hasta un 89% de precisión.
Twitter geolocation is useful for various purposes, including tracking COVID-19 perceptions, analyzing political trends, and managing natural disasters. However, accurately predicting geolocations based on tweet content remains a challenge. In the past, machine learning approaches have tried to solve this problem by training prediction models on previously seen data, but these models often struggle to generalize to unseen places. To overcome these limitations, in this work we present a framework based on Natural Language Processing (NLP), Knowledge Graphs (KG), and Semantic Web to find geographical entities on tweets’ content. KG facilitate the extraction of structured knowledge of texts in order to study their semantic analysis based on NLP techniques to search associated geographical coordinates to the entities of that KG; if there is explicit mention of places in the tweet, the Semantic Web is used to find geographical information associated with the entities present in the tweets’ content. To evaluate the precision of the prediction algorithm, we compare our predicted latitude and longitude coordinates with AlbertaT6 floods dataset. Our results show an F1 score up to 0.851 within a 10 kilometer radius.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.