The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organization System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied TF-IDF as a language model (backend algorithm), and selected Snowball as an analyzer. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with a data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open source software, open datasets, and open standards.
Authority control for bibliographic data management in Indian libraries is generally a neglected area and thereby library OPACs of the country (including OPAC of the National Library) supports only the finding function of a catalogue and not the collocating function. In this context, the part II of the three part series on library carpentry (part I has been published in April issue) is an attempt to apply library carpentry methods in building authority datasets from scratch. It deals with the methodologies for developing authority datasets by applying data wrangling techniques and subsequent transformations of these datasets into ready-to-import MARC 21 format (for authority data). Like the previous part of the series, this research study is also represented through a case study. The case study narrates development of geographic name authority datasets for - states and union territories (level I), districts of India (level II), sub-districts (level III) and community development blocks of India (level IV). It also demonstrates that how the merged geographic name authority file for India can be implemented in an open source ILS and can become instrumental in enhancing retrieval efficiency through geodetic search feature in an open source library discovery system. It concludes that the proposed mechanisms and methodology (supported with proofs of the concept) may lead to a new era of authority-controlled cataloguing in Indian libraries.
Physical stacks in academic libraries, despite the advent of digital repositories, remain important to users, especially in countries like India where physical resources hold considerable value. This study seeks to develop a system that enables users to locate books physically by integrating stackmaps functionalities with Koha OPAC. In addition, the study showcases how an open-source text analytics server can be incorporated inside an OPAC in Koha to generate various word-level visualizations by analyzing a text corpus, including the identification of geospatial features such as place names. This research aims to contribute to the advancement of information retrieval and visualization techniques in OPACs in academic libraries, and to improve the user experience in locating physical resources. (The video abstract of this paper may be found at: https://youtu.be/q940TUkcTTE ).
The domain of library and information science is always on the move and LIS professionals are ardent users of emerging technologies. This research work discusses an emerging possibility in the LIS domain, which applies data science principles and techniques in the bibliographic world. The concept is known as library carpentry and involves different data wrangling techniques to get insight of bibliographic datasets. The discussion starts with the basic concepts of library carpentry and systematically reveals the components and methods of library carpentry with the help of three case studies. The case studies represent a variety of actual problem solving projects by using open datasets and open source data wrangling software called Openrefine. The case study (I) deals with the application of library carpentry in e-book selection by taking into consideration socio-academic web space data, the case study (II) shows how is it possible to quickly get an overview of institutional contributions to open access domain by applying library carpentry methods and the case study (III) demonstrates the process of gender analysis with the help of a name-to-gender inference service and by applying data wrangling techniques. Each case study is supported by a comprehensive and representative dataset to support and promote real-life problem solving in processional sphere by applying library carpentry methods.
This research study is an attempt to develop a MARC-formatted authority dataset for Indian geo-administrative units given the inadequate coverage of Indian place names in global authority datasets. It starts with an authenticated place names file in CSV format and applies data wrangling tools and techniques to fetch geospatial data and other related datasets from open access data sources to develop a geographic name authority file for Indian place names with geocoordinate data values. Later, this research also demonstrates how that authority dataset can be implemented in an open-source ILS and how retrieval features of a library discovery system can be enhanced through a geodetic search interface by utilizing that authority dataset. The entire methodologies are based on open data, open-source software, and open standards.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.