Abstract. In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to the documents to be archived is an interesting option in theory. It could save a lot of work in the time-consuming and expensive task of manual annotation and it could help cataloguers attain a higher inter annotator agreement. However, some questions arise in practice: what is the quality of the automatically produced annotations? How do they compare with manual annotations and with the requirements for annotation that were defined in the archive? If different from the manual annotations, are the automatic annotations wrong? In the CHOICE project, partially hosted at the Netherlands Institute for Sound and Vision, the Dutch public archive for audiovisual broadcasts, we automatically generate annotation suggestions for cataloguers. In this paper, we define three types of evaluation of these annotation suggestions: (1) a classic and strict precision/recall measure expressing the overlap between automatically generated keywords and the manual annotations, (2) a loosened precision/recall measure for which semantically very similar annotations are also considered as relevant matches, (3) an in-use evaluation of the usefulness of manual versus automatic annotations in the context of Serendipitous Browsing. During serendipitous browsing the annotations (manual or automatic) are used to retrieve and visualize semantically related documents. ContextThe Netherlands Institute for Sound and Vision (henceforth S&V) is in charge of archiving publicly broadcasted TV and radio programs in the Netherlands. Two years ago the audiovisual production and archiving environment changed from analogue towards digital data. This effectively quadrupled the inflow of archival material and as such the amount of work for cataloguers. The two most important customer groups are: 1) professional users from the public broadcasters and 2) users from science and education. These typically have three kinds of user queries:
Despite its scientific, political, and practical value, comprehensive information about human languages, in all their variety and complexity, is not readily obtainable and searchable. One reason is that many language data are collected as audio and video recordings which imposes a challenge to document indexing and retrieval. Annotation of multimedia data provides an opportunity for making the semantics explicit and facilitates the searching of multimedia documents. We have developed OntoELAN, an ontology-based linguistic multimedia annotator that features: (1) support for loading and displaying ontologies specified in OWL; (2) creation of a language profile, which allows a user to choose a subset of terms from an ontology and conveniently rename them if needed; (3) creation of ontological tiers, which can be annotated with profile terms and, therefore, corresponding ontological terms; and (4) saving annotations in the XML format as Multimedia Ontology class instances and, linked to them, class instances of other ontologies used in ontological tiers. To our best knowledge, OntoELAN is the first audio/video annotation tool in linguistic domain that provides support for ontology-based annotation.
Abstract. In this article we report on a user study aimed at evaluating and improving a thesaurus browser. The browser is intended to be used by documentalists of a large public audio-visual archive for finding appropriate indexing terms for TV programs. The subjects involved in the study were documentalists of the institutions involved. The study provides insight into the value of various thesaurus browsing and searching techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.