2004
DOI: 10.1007/s10579-004-8682-1
|View full text |Cite
|
Sign up to set email alerts
|

Article: Collating Texts Using Progressive Multiple Alignment

Abstract: To reconstruct a stemma or do any other kind of statistical analysis of a text tradition, one needs accurate data on the variants occurring at each location in each witness. These data are usually obtained from computer collation programs. Existing programs either collate every witness against a base text or divide all texts up into segments as long as the longest variant phrase at each point. These methods do not give ideal data for stemma reconstruction. We describe a better collation algorithm (progressive … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0
1

Year Published

2009
2009
2014
2014

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 22 publications
(9 citation statements)
references
References 36 publications
(30 reference statements)
0
8
0
1
Order By: Relevance
“…Most importantly, the independence of the alignment results from the order in which the versions are aligned needs more testing. Although no dependence on the order could be witnessed in test cases found in other publications addressing the issue (Spencer and Howe, 2004), it is possible that for example a combination of repeated tokens in versions and a change in the order of their comparison might cause different results. Another issue is testing and benchmarking.…”
Section: Comparing Texts With Collatexmentioning
confidence: 90%
See 3 more Smart Citations
“…Most importantly, the independence of the alignment results from the order in which the versions are aligned needs more testing. Although no dependence on the order could be witnessed in test cases found in other publications addressing the issue (Spencer and Howe, 2004), it is possible that for example a combination of repeated tokens in versions and a change in the order of their comparison might cause different results. Another issue is testing and benchmarking.…”
Section: Comparing Texts With Collatexmentioning
confidence: 90%
“…6 Protein sequences -not unlike texts in natural language -can be modeled as sequences of symbols, whose differences can be understood as a set of welldefined editing operations (Levenshtein, 1966), which transform one sequence into another and can be computed. The analogy goes even further as the consecutive evaluation of assumed editing operations between protein sequences on the one hand and texts on the other hand bears striking similarities as they often provide the basis for further stemmatic analysis and genetic reasoning (Spencer and Howe, 2004). The 3 Cf.…”
Section: Computer Supported Collation With Collatexmentioning
confidence: 99%
See 2 more Smart Citations
“…First of all, the two FineReader outputs (with and without the built-in trainings) have been aligned with the same methodology explained below for the alignments among different engines and we have obtained a new, more accurate FineReader output to be aligned with the other engines. Outputs of the three engines have been aligned by a progressive multiple sequence alignment algorithm, as illustrated in Spencer [29]. The general principle of progressive alignment is that the most similar sequence pairs are aligned first, necessary gaps to align the sequences are fixed and supplementary gaps (with minimal costs) are progressively added to the previous aligned sequences, in order to perform the total alignment.…”
Section: Multiple Alignment and Naive Bayes Classifiermentioning
confidence: 99%