Abstract. Today, digital libraries more and more have to rely on semantic techniques during the workflows of metadata generation, search and navigational access. But, due to the statistical and/or collaborative nature of such techniques, the underlying quality of automatically generated metadata is questionable. Since data quality is essential in digital libraries, we present a user study on one hand evaluating metrics for quality assessment, on the other hand evaluating their benefit for the individual user during interaction. To observe the interaction of domain experts in the sample field of chemistry, we transferred the abstract metrics' outcome for a sample semantic technique into three different kinds of visualizations and asked the experts to evaluate these visualizations first without, later augmented with the quality information. We show that the generated quality information is indeed not only essential for data quality assurance in the curation step of digital libraries, but will also be helpful for designing intuitive interaction interfaces for end-users.
Keywords: Digital Libraries, Information Quality, Semantic Technologies
IntroductionDigital Libraries have to handle a vast amount of data ranging from individual papers or reports in journals, conference proceedings etc. up to complete digitized books. Making such data searchable relies mainly on the amount and quality of the provided metadata. On a purely bibliographic level this metadata is relatively easy to derive and maintain, in contrast on the content level the problem of deriving correct metadata obviously grows with the density of information. Whereas the information contained in short conference papers can be manually extracted and annotated quite easily, capturing the content contained in a book definitely needs automatic means of extraction. Today semantic techniques relying on statistically approaches like term co-occurrences or frequencies are already commonplace. But the quality of metadata derived by such techniques is largely uninvestigated. Thus a main topic of future digital library research has to be a quality assessment of such techniques. Obviously the quality -like in information retrievals' precision / recall analysis -can only be evaluated comparing the techniques output with manually provided judgments.