2011
DOI: 10.1186/1687-4722-2011-10
|View full text |Cite
|
Sign up to set email alerts
|

System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive

Abstract: The main objective of the work presented in this paper was to develop a complete system that would accomplish the original visions of the MALACH project. Those goals were to employ automatic speech recognition and information retrieval techniques to provide improved access to the large video archive containing recorded testimonies of the Holocaust survivors. The system has been so far developed for the Czech part of the archive only. It takes advantage of the state-of-the-art speech recognition system tailored… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(13 citation statements)
references
References 14 publications
0
13
0
Order By: Relevance
“…First of all, we used the baseline empirical approach to phoneme-based search described in [10]. The first three rows of Tab.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…First of all, we used the baseline empirical approach to phoneme-based search described in [10]. The first three rows of Tab.…”
Section: Methodsmentioning
confidence: 99%
“…In this paper, we use the index-based method for determining the occurrence candidates. The method is based on the empirical approach [10], but it is used only to determine the candidates and no relevance score estimation is performed. To index the archive we use the sub-word units, particularly the triplets of phonemes.…”
Section: Term Occurrence Candidatesmentioning
confidence: 99%
See 1 more Smart Citation
“…Lemmatization has been shown to improve the results when dealing with sparse data in the area of information retrieval [15] and spoken term detection [16] in highly inflected languages, on that account the experiments on the effects of lemmatization in the field of topic identification was performed [17]. As a result of these experiments the automatic text lemmatization is also applied in our work.…”
Section: System For Acquisition and Storing Datamentioning
confidence: 99%
“…Lemmatization has been shown to improve the results when dealing with sparse data in the area of information retrieval [4] and spoken term detection [10] in highly inflected languages, therefore the effects of lemmatization on topic identification accuracy is studied in the paper. On the other hand, since the system is used for processing large amounts of data, a summarization method was implemented and the effect of using only the summary of an article on the topic identification accuracy is studied.…”
Section: Introductionmentioning
confidence: 99%