2004
DOI: 10.1016/j.febslet.2004.12.046
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of the current RefSeq, Ensembl and EST databases for counting genes and gene discovery

Abstract: Large amounts of refined sequence material in the form of predicted, curated and annotated genes and expressed sequences tags (ESTs) have recently been added to the NCBI databases. We matched the transcript-sequences of RefSeq, Ensembl and dbEST in an attempt to provide an updated overview of how many unique human genes can be found. The results indicate that there are about 25 000 unique genes in the union of RefSeq and Ensembl with 12-18% and 8-13% of the genes in each set unique to the other set, respective… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
20
0

Year Published

2005
2005
2022
2022

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 32 publications
(21 citation statements)
references
References 34 publications
1
20
0
Order By: Relevance
“…Paired-end reads were mapped to the RefSeq database (National Center for Biotechnology Information (NCBI) build 37) using the Burrows-Wheeler Aligner (BWA) software with default parameters that allow up to 3 alignments for each read and up to 2 mismatches for the seed sequence (the first 25 bp of each read) [15]. Reads that failed to map to RefSeq were mapped to the Ensembl database, which includes additional transcripts and pseudogenes [16]. Remaining unmapped reads were mapped to the human genome assembly (NCBI build 37).…”
Section: Methodsmentioning
confidence: 99%
“…Paired-end reads were mapped to the RefSeq database (National Center for Biotechnology Information (NCBI) build 37) using the Burrows-Wheeler Aligner (BWA) software with default parameters that allow up to 3 alignments for each read and up to 2 mismatches for the seed sequence (the first 25 bp of each read) [15]. Reads that failed to map to RefSeq were mapped to the Ensembl database, which includes additional transcripts and pseudogenes [16]. Remaining unmapped reads were mapped to the human genome assembly (NCBI build 37).…”
Section: Methodsmentioning
confidence: 99%
“…Today alternative splicing mechanisms, including exon skipping, alternative exon insertions, use of alternative 5′ splice site and 3′ splice site, and intron retention, are known to be one of the most important mechanisms in providing complexity of eukaryotic proteomes. These mechanisms facilitate the production of a much higher number of possible proteins than 25,000-30,000, which are the number of protein coding genes that have been identified in the human genome (InternationalHumanGenomeSequencingConsortium, 2004;Larsson et al, 2005). Estimations from several studies conclude that 40-60% of the human genes undergo alternative splicing, most of which affect the coding sequence leading to the formation of either functional or non-functional protein products .…”
Section: Introductionmentioning
confidence: 98%
“…There are several examples of domain changes in the N termini through alternative splicing [21,25,93] and the fact that their introns are frequent and long make them very interesting to study with respect to non-classical transcription. The increasing number of ESTs and the possibility of using genome assembly to aid alignment of expression and gene databases has provided good opportunities to study alternative splice variants [79,94]. In a recent study, we used ESTs and full-length mRNA sequences to systematically analyse splice variants for the Adhesion GPCRs.…”
Section: Alternative Splicing and Role Of Intronsmentioning
confidence: 99%