Dublin City University at CLEF 2007: Cross-Language Speech Retrieval Experiments

Jones, Gareth; Zhang, Ke; Lam-Adesina, Adenike M.

doi:10.1007/978-3-540-85760-0_89

Cited by 4 publications

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

A Soundex-Based Approach for Spoken Document Retrieval

Reyes-Barragán

Villaseñor-Pineda

Montes-y-Gómez

2008

MICAI 2008: Advances in Artificial Intelligence

View full text Add to dashboard Cite

Abstract. Current storage and processing facilities have caused the emergence of many multimedia repositories and, consequently, they have also triggered the necessity of new approaches for information retrieval. In particular, spoken document retrieval is a very complex task since existing speech recognition systems tend to generate several transcription errors (such as word substitutions, insertions and deletions). In order to deal with these errors, this paper proposes an enriched document representation based on a phonetic codification of the automatic transcriptions. This representation aims to reduce the impact of the transcription errors by representing words with similar pronunciations through the same phonetic code. Experimental results on the CL-SR corpus from the CLEF 2007 (which includes 33 test topics and 8,104 English interviews) are encouraging; our method achieved a mean average precision of 0.0795, outperforming all except one of the evaluated systems at this forum.

show abstract

A Soundex-Based Approach for Spoken Document Retrieval

Reyes-Barragán

Villaseñor-Pineda

Montes-y-Gómez

2008

MICAI 2008: Advances in Artificial Intelligence

View full text Add to dashboard Cite

show abstract

Hybrid and Interactive Domain-Specific Translation for Multilingual Access to Digital Libraries

Jones

Fuller

Newman

et al. 2011

Advanced Language Technologies for Digital Libraries

Self Cite

View full text Add to dashboard Cite

Abstract. Accurate high-coverage translation is a vital component of reliable cross language information retrieval (CLIR) systems. This is particularly true for retrieval from archives such as Digital Libraries which are often specific to certain domains. While general machine translation (MT) has been shown to be effective for CLIR tasks in laboratory information retrieval evaluation tasks, it is generally not well suited to specialized situations where domain-specific translations are required. We demonstrate that effective query translation in the domain of cultural heritage (CH) can be achieved using a hybrid translation method which augments a standard MT system with domain-specific phrase dictionaries automatically mined from Wikipedia . We further describe the use of these components in a domain-specific interactive query translation service. The interactive system selects the hybrid translation by default, with other possible translations being offered to the user interactively to enable them to select alternative or additional translation(s). The objective of this interactive service is to provide user control of translation while maximising translation accuracy and minimizing the translation effort of the user. Experiments using our hybrid translation system with sample query logs from users of CH websites demonstrate a large improvement in the accuracy of domain-specific phrase detection and translation.

show abstract

Investigating segment-based query expansion for user-generated spoken content retrieval

Ahmad

Jones

2016

2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)

Self Cite

View full text Add to dashboard Cite

The very rapid growth in user-generated social multimedia content on online platforms is creating new challenges for search technologies. A significant issue for search of this type of content is its highly variable form and quality. This is compounded by the standard information retrieval (IR) problem of mismatch between search queries and target items. Query Expansion (QE) has been shown to be an effect technique to improve IR effectiveness for multiple search tasks. In QE, words from a number of relevant or assumed relevant top ranked documents from an initial search are added to the initial search query to enrich it before carrying out a further search operation. In this work, we investigate the application of QE methods for searching social multimedia content. In particular we focus on social multimedia content where the information is primarily in the audio stream. To address the challenge of content variability, we introduce three speech segment-based methods for QE using: Semantic segmentation, Discourse segmentation and Window-Based. Our experimental investigation illustrates the superiority of these segment-based methods in comparison to a standard full document QE method for a version of the MediaEval 2012 Search task newly extended as an adhoc search task.

show abstract

Dublin City University at CLEF 2007: Cross-Language Speech Retrieval Experiments

Cited by 4 publications

References 7 publications

A Soundex-Based Approach for Spoken Document Retrieval

A Soundex-Based Approach for Spoken Document Retrieval

Hybrid and Interactive Domain-Specific Translation for Multilingual Access to Digital Libraries

Investigating segment-based query expansion for user-generated spoken content retrieval

Contact Info

Product

Resources

About