1989
DOI: 10.1093/nar/17.10.3951
|View full text |Cite
|
Sign up to set email alerts
|

Sequence errors described in GenBank: a means to determine the accuracy of DNA sequence interpretation

Abstract: The accuracy of nucleic acid sequence data interpretation was determined by assessing and quantifying the discrepancies reported in the GenBank database. This permitted the calculation of an Error Rate (ER) for nucleic acid sequence determination. If one assumes that most entries (TB, Total Bases) were independently verified or those without reported discrepancies were correct, the ER is 0.368 errors per 1000 bases. However, if one assumes that only those sequences with reported discrepancies (TBIQ, Total Base… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
11
0

Year Published

1992
1992
2013
2013

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(11 citation statements)
references
References 11 publications
0
11
0
Order By: Relevance
“…Much work remains to be done both in the analytic aspects of modeling and in the empirical realm of cataloging sequence errors (see Krawetz, 1989). The procedures described here can be implemented as a first step.…”
Section: Discussionmentioning
confidence: 99%
“…Much work remains to be done both in the analytic aspects of modeling and in the empirical realm of cataloging sequence errors (see Krawetz, 1989). The procedures described here can be implemented as a first step.…”
Section: Discussionmentioning
confidence: 99%
“…This is similar to the error rates of 3.6 and 3.2% reported by Kristensen et al (1992) and Lamperti et al (1992), respectively, in vector sequences that had contaminated the sequence databases. Krawetz (1989), however, reports an error rate of only 0.29% for GenBank on the basis of the frequency of annotated conflicts and revisions, and Beck (1993) reports error rates averaging 0.46% for resequencing of cosmids containing human genomic DNA by independent laboratories.…”
mentioning
confidence: 99%
“…Over the years, the use and development of genetic approaches have resulted in the generation of large amounts of genetic data, which has been made publically available in repositories such as GenBank [2], providing a unique and very valuable resource for the research community. The utility of such public data, produced by others and from several different researchers, relies heavily on the assumption of high data quality [3]. However, although much has been accomplished in terms of minimizing their prevalence, sequence errors are still an important issue for both Sanger and next generation sequencing data [47].…”
Section: Introductionmentioning
confidence: 99%