Adding machine learning and knowledge intensive techniques to a digital library service

Esposito, Floriana; Malerba, Donato; Semeraro, Giovanni; Fanizzi, Nicola; Ferilli, Stefano

doi:10.1007/s007990050033

Cited by 25 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If good results will be obtained, it is possible thinking to carry out experiments that take advantage also from the structure of semi-structured documents. Indeed, we are involved in the project CDL (Esposito et al, 1998;Costabile et al, 1999), that could profit by this kind of techniques as regard semantic indexing of the stored documents (cf. (Chanod, 1999)).…”

Section: Discussionmentioning

confidence: 99%

Learning from parsed sentences with INTHELEX

Esposito

Ferilli

Fanizzi

et al. 2000

Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

In the context of language learning, we address a logical approach to information extraction. The system INTHELEX, used to carry out this task, requires a logic representation of sentences to run the learning algorithm. Hence, the need for parsers to produce structured representations from raw text. This led us to develop a prototypical Italian language parser, as a preprocessor in order to obtain the structured representation of sentences required for the symbolic learner to work. A preliminary experimentation proved that the logic approach to learning from language is able to capture the semantics underlying the kind of sentences that were processed, even if a comparison with classical methods as regards efficiency has still to be done.

show abstract

Section: Discussionmentioning

confidence: 99%

Learning from parsed sentences with INTHELEX

Esposito

Ferilli

Fanizzi

et al. 2000

Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

show abstract

“…Since the advent of OPACs, more and more libraries have provided the OPAC service for their services, and the OPAC has also become an important symbol of digital libraries. OPAC offers people with an additional option to search for the online information, especially for searching academic information, such as e-books and academic papers (Esposito et al, 1998). Users also have different information activities on the OPAC, such as searching for information, browsing the information, and gaining the knowledge.…”

Section: User's Information Behavior In Opacmentioning

confidence: 99%

“…Machine learning has gained more and more attention in the field of digital libraries. The theory and technology of machine learning can provide valuable support for digital library to develop more intelligent digital services (Esposito et al, 1998). Li et al (2009) used a semisupervised machine learning framework, combining with traditional literature retrieval methods to construct a ranking model for document retrieval structures based on semi-supervised learning of library user preferences.…”

Section: Prediction Of Cross-device Transitionmentioning

confidence: 99%

Predicting Academic Digital Library OPAC Users' Cross-device Transitions

Liang

2019

Data and Information Management

View full text Add to dashboard Cite

With more and more users using different devices, such as personal computers, iPads, and smartphones, they can access OPAC (online public access catalog) services and other digital library services in different contexts. This leads to the phenomenon that user’s behavior can be transferred to different devices, which leads to the richness and diversity of user’s behavior data in digital libraries. A large number of user data challenge digital libraries to analyze user’s behavior, such as search preferences and borrowing habits. In this study, we study the user’s cross-device transition behavior when using OPAC. Based on the large-scale OPAC transaction log, the online activities between device transitions in the process of using OPAC are studied. In order to predict the follow-up activities that users may take, and the next device that users may use, we detect features from several perspectives and analyze the feature importance. We find that the activity and time interval on the first device are more important for predicting the user’s next activity and the next device. In addition, features of operating system help to better predict the next device. The next device used is more likely to predict the next activity after the device transition. This study examines the cross-device transition prediction in library OPAC, which can help libraries provide smart services for users when accessing OPAC on different devices.

show abstract

“…To apply machine learning (ML) to one of the standard DL circulation activities, namely text categorization [48], is part of the cognitive toolbox deployed [18]. In this context, ML is extensively being experimented with in different development areas and scenarios; to name but a few, for extracting image content from figures in scientific documents for categorization [33,34], automatically assessing and characterizing resource quality for educational DL [54,5], assessing the quality of scientific conferences [37], web-based collection development [42], automated document metadata extraction by support vector machines (SVM, [24]), automatic extraction of titles from general documents [27], information architecture [17], to remove duplicate documents [9], for collaborative filtering [59], for the automatic expansion of domain-specific lexicons by term categorization [3], for generating visual thesauri [45], or the semantic markup of documents [13].…”

Section: Introductionmentioning

confidence: 99%

Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints

2012

View full text Add to dashboard Cite

Digital libraries increasingly benefit from research on automated text categorization for improved access. Such research is typically carried out by using standard test collections. In this paper we present a pilot experiment of replacing such test collections by a set of 6000 objects from a real-world digital repository, indexed by Library of Congress Subject Headings, and test support vector machines in a supervised learning setting for their ability to reproduce the existing classification. To augment the standard approach, we introduce a combination of two novel elements: using functions for document content representation in Hilbert space, and adding extra semantics from lexical resources to the representation. Results suggest that wavelet-based kernels slightly outperformed traditional kernels on classification reconstruction from abstracts and vice versa from full-text documents, the latter outcome due to word sense ambiguity. The practical implementation of our methodological framework enhances the analysis and representation of specific knowledge relevant to large-scale digital collections, in this case the thematic coverage of the collections. Representation of specific knowledge about digital collections is one of the basic elements of the persistent archives and the less studied one (compared to representations of digital objects and collections). Our research is an initial step in this direction developing further the methodological approach and demonstrating that text categorisation can be applied to analyse the thematic coverage in digital repositories.

show abstract

Adding machine learning and knowledge intensive techniques to a digital library service

Cited by 25 publications

References 24 publications

Learning from parsed sentences with INTHELEX

Learning from parsed sentences with INTHELEX

Predicting Academic Digital Library OPAC Users' Cross-device Transitions

Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints

Contact Info

Product

Resources

About