1995 International Conference on Acoustics, Speech, and Signal Processing
DOI: 10.1109/icassp.1995.479391
|View full text |Cite
|
Sign up to set email alerts
|

Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
56
0
1

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 86 publications
(58 citation statements)
references
References 8 publications
1
56
0
1
Order By: Relevance
“…These strings should rather work as individual vocabulary items in the model. It has been shown that increased performance of n-gram models can be obtained by adding larger units consisting of common word sequences to the vocabulary; see e.g., (Deligne and Bimbot, 1995). Nevertheless, in the near future we wish to explore possibilities of using complementary and more standard evaluation measures, such as precision, recall, and F-measure of the discovered morph boundaries.…”
Section: Discussionmentioning
confidence: 99%
“…These strings should rather work as individual vocabulary items in the model. It has been shown that increased performance of n-gram models can be obtained by adding larger units consisting of common word sequences to the vocabulary; see e.g., (Deligne and Bimbot, 1995). Nevertheless, in the near future we wish to explore possibilities of using complementary and more standard evaluation measures, such as precision, recall, and F-measure of the discovered morph boundaries.…”
Section: Discussionmentioning
confidence: 99%
“…Many unsupervised methods have been proposed for segmenting raw character sequences with no boundary information into words [1,2,4,5,8,14,15]. Brent [1] gives a good survey of these methods.…”
Section: Of Tokens T H E M O S T F a V O U R I T E M U S I C O F A L mentioning
confidence: 99%
“…Therefore, we partition our data into meaningful ngrams first. Based on the work of Deligne and Bimbot [35], we compute multigram models for the documents in our corpus the following way: Each sentence is considered as a sequence of n-grams with variable length. The likelihood of a sentence is computed by summing up the individual likelihoods of the n-grams corresponding to each possible segmentation of the sentence.…”
Section: B Preprocessingmentioning
confidence: 99%