Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
One of the key AI tools for textual corpora exploration is natural language question-answering (QA). Unlike keyword-based search engines, QA algorithms receive and process natural language questions and produce precise answers to these questions, rather than long lists of documents that need to be manually scanned by the users. State-of-the-art QA algorithms based on DNNs were successfully employed in various domains. However, QA in the genealogical domain is still underexplored, while researchers in this field (and other fields in humanities and social sciences) can highly benefit from the ability to ask questions in natural language, receive concrete answers and gain insights hidden within large corpora. While some research has been recently conducted for factual QA in the genealogical domain, to the best of our knowledge, there is no previous research on the more challenging task of numerical aggregation QA (i.e., answering questions combining aggregation functions, e.g., count, average, max). Numerical aggregation QA is critical for distant reading and analysis for researchers (and the general public) interested in investigating cultural heritage domains. Therefore, in this study, we present a new end-to-end methodology for numerical aggregation QA for genealogical trees that includes: 1) an automatic method for training dataset generation; 2) a transformer-based table selection method, and 3) an optimized transformer-based numerical aggregation QA model. The findings indicate that the proposed architecture, GLOBE, outperforms the state-of-the-art models and pipelines by achieving 87% accuracy for this task compared to only 21% by current state-of-the-art models. This study may have practical implications for genealogical information centers and museums, making genealogical data research easy and scalable for experts as well as the general public.
One of the key AI tools for textual corpora exploration is natural language question-answering (QA). Unlike keyword-based search engines, QA algorithms receive and process natural language questions and produce precise answers to these questions, rather than long lists of documents that need to be manually scanned by the users. State-of-the-art QA algorithms based on DNNs were successfully employed in various domains. However, QA in the genealogical domain is still underexplored, while researchers in this field (and other fields in humanities and social sciences) can highly benefit from the ability to ask questions in natural language, receive concrete answers and gain insights hidden within large corpora. While some research has been recently conducted for factual QA in the genealogical domain, to the best of our knowledge, there is no previous research on the more challenging task of numerical aggregation QA (i.e., answering questions combining aggregation functions, e.g., count, average, max). Numerical aggregation QA is critical for distant reading and analysis for researchers (and the general public) interested in investigating cultural heritage domains. Therefore, in this study, we present a new end-to-end methodology for numerical aggregation QA for genealogical trees that includes: 1) an automatic method for training dataset generation; 2) a transformer-based table selection method, and 3) an optimized transformer-based numerical aggregation QA model. The findings indicate that the proposed architecture, GLOBE, outperforms the state-of-the-art models and pipelines by achieving 87% accuracy for this task compared to only 21% by current state-of-the-art models. This study may have practical implications for genealogical information centers and museums, making genealogical data research easy and scalable for experts as well as the general public.
En este artículo se presenta el desarrollo de un chatbot utilizando como red neuronal una versión ligera del modelo BERT denominado DistilBERT, que ayude a estudiantes o profesionales a solventar inquietudes con respecto a matrículas y homologaciones para los estudios de cuarto nivel o posgrados en la Universidad Nacional de Loja (UNL). En este contexto, el proyecto se dividió en dos etapas: en la primera, se hizo una búsqueda bibliográfica en artículos científicos sobre las tecnologías y herramientas compatibles para realizar el ajuste del modelo BERT mediante un entrenamiento en la tarea de preguntas y respuestas; en la segunda etapa, se llevó a cabo el desarrollo del chatbot siguiendo la metodología de Programación Extrema (XP) dividida en cuatro fases: planeación, diseño, codificación y pruebas. En la fase de planeación, se llevó a cabo el entrenamiento del modelo, requisito necesario para la implementación del chatbot. En esta fase se especificaron los parámetros para el entrenamiento modelo y la descripción de forma general del funcionamiento del agente conversacional mediante historias de usuario. En la segunda fase se diseñó la arquitectura, en la que se muestran todos los elementos que formaron parte del chatbot. En la tercera fase se llevó a cabo la programación utilizando los lenguajes de programación Python, JavaScript, Css, Html y el microframework Flask. Finalmente, en la última fase se ejecutaron pruebas de rendimiento, carga y estrés para ver el comportamiento del chatbot al ser sometido a una carga considerable de peticiones.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.