Information technology boosts the development of database retrieval in the Chinese digital humanities domain. However, most database providers adopt a system-oriented design pattern, which fails to handle the problem of query gaps in users’ retrieval process. This issue seriously hinders the effective use of database retrieval functionalities, peculiarly among those historical and humanities researchers. To address it, we propose UFTDRDH, a novel user-oriented solution based on automatic query formulation (AQF) technologies, which integrates a human–machine interactive module for the selection of new query-related expansion terms and a powerful query expansion algorithmic component (UFTDRDH-QEV) optimised by a topic-enhancing relevance feedback model approach (ToQE). To verify the effectiveness of UFTDRDH, several comparative experiments are conducted, including quantitative evaluation for retrieval efficiency and user satisfaction, as well as qualitative studies for interpretative traceability. The empirical results are multidimensional and robust, which not only shows the positive effects of different AQFs on gap reduction, especially the importance of query expansion as the most effective technology, but also underlines the remarkably advantageous performance of UFTDRDH compared with traditional system-oriented automatic query expansion in different task contexts. We believe the application of UFTDRDH can further strengthen the research focus on user-centred design and improve the level of current full-text database retrieval in the field of Chinese digital humanities. Broadly speaking, this solution can be also extended to the full-text database retrieval in other languages and digital humanities domains.
Understanding various historical entity information (e.g., persons, locations, and time) plays a very important role in reasoning about the developments of historical events. With the increasing concern about the fields of digital humanities and natural language processing, named entity recognition (NER) provides a feasible solution for automatically extracting these entities from historical texts, especially in Chinese historical research. However, previous approaches are domain-specific, ineffective with relatively low accuracy, and non-interpretable, which hinders the development of NER in Chinese history. In this paper, we propose a new hybrid deep learning model called “subword-based ensemble network” (SEN), by incorporating subword information and a novel attention fusion mechanism. The experiments on a massive self-built Chinese historical corpus CMAG show that SEN has achieved the best with 93.87% for F1-micro and 89.70% for F1-macro, compared with other advanced models. Further investigation reveals that SEN has a strong generalization ability of NER on Chinese historical texts, which is not only relatively insensitive to the categories with fewer annotation labels (e.g., OFI) but can also accurately capture diverse local and global semantic relations. Our research demonstrates the effectiveness of the integration of subword information and attention fusion, which provides an inspiring solution for the practical use of entity extraction in the Chinese historical domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.