Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data 2008
DOI: 10.1145/1390749.1390762
|View full text |Cite
|
Sign up to set email alerts
|

Data driven methods for improving mono- and cross-lingual IR performance in noisy environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…The notion of skipgram (McNamee, 2008), also referred to as gap-n-gram (Mustafa, 2005) or s-gram (Järvelin et al, 2008) by other authors, is a generalization of the concept of n-gram by allowing skips during the matching process. However, McNamee (2008) showed that skipgrams are dramatically more costly than traditional n-grams and, while performing reasonably well, they are not demonstrably more effective.…”
Section: The N-gram Based Approachmentioning
confidence: 99%
“…The notion of skipgram (McNamee, 2008), also referred to as gap-n-gram (Mustafa, 2005) or s-gram (Järvelin et al, 2008) by other authors, is a generalization of the concept of n-gram by allowing skips during the matching process. However, McNamee (2008) showed that skipgrams are dramatically more costly than traditional n-grams and, while performing reasonably well, they are not demonstrably more effective.…”
Section: The N-gram Based Approachmentioning
confidence: 99%
“…Classic Vector Space and Probabilistic models (Manning et al, 2008) are the first options. However, the very special and noisy nature of Egyptian writing system and the application context may suggest the use of other approaches: the use of standard character n-grams as a working unit, a solution successfully applied in both noisy contexts (Vilares et al, 2011) and languages whose writing systems share characteristics with Egyptian, such as Japanese (Ogawa and Matsuda, 1999), Chinese (Foo and Li, 2004), Korean (Lee and Ahn, 1996) or Arabic (Mustafa and Al-Radaideh, 2004); the use of so-called character s-grams (Järvelin et al, 2008), a generalization of the concept of n-gram by allowing skips during the matching process; the application of localitybased models (de Kretser and Moffat, 1999); or phonetic matching (Yasukawa et al, 2012). Closer to the NLP field, the development of conflation mechanisms based on lemmatization or morphological analysis (Piotrowski, 2012, Ch.…”
Section: Discussionmentioning
confidence: 99%
“…Other IR-related, but more complex, application of n-grams are the use of skipgrams (McNamee, 2008), also referred to as gap-n-grams (Mustafa, 2005) or s-grams (Järvelin et al, 2008) by other authors. This is a generalization of the concept of n-gram by allowing skips during the matching process.…”
Section: Background and Related Workmentioning
confidence: 99%