The sequence determination of the entire genome of the Synechocystis sp. strain PCC6803 was completed. The total length of the genome finally confirmed was 3,573,470 bp, including the previously reported sequence of 1,003,450 bp from map position 64% to 92% of the genome. The entire sequence was assembled from the sequences of the physical map-based contigs of cosmid clones and of lambda clones and long PCR products which were used for gap-filling. The accuracy of the sequence was guaranteed by analysis of both strands of DNA through the entire genome. The authenticity of the assembled sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA using the assembled sequence data. To predict the potential protein-coding regions, analysis of open reading frames (ORFs), analysis by the GeneMark program and similarity search to databases were performed. As a result, a total of 3,168 potential protein genes were assigned on the genome, in which 145 (4.6%) were identical to reported genes and 1,257 (39.6%) and 340 (10.8%) showed similarity to reported and hypothetical genes, respectively. The remaining 1,426 (45.0%) had no apparent similarity to any genes in databases. Among the potential protein genes assigned, 128 were related to the genes participating in photosynthetic reactions. The sum of the sequences coding for potential protein genes occupies 87% of the genome length. By adding rRNA and tRNA genes, therefore, the genome has a very compact arrangement of protein- and RNA-coding regions. A notable feature on the gene organization of the genome was that 99 ORFs, which showed similarity to transposase genes and could be classified into 6 groups, were found spread all over the genome, and at least 26 of them appeared to remain intact. The result implies that rearrangement of the genome occurred frequently during and after establishment of this species.
CyanoBase (http://www.kazusa.or.jp/cyano/) is a database containing genomic information on the cyanobacterium Synechocystis sp. strain PCC6803. It furnishes an annotation to each of the 3168 protein genes deduced from the entire nucleotide sequence of this genome. Information on the genome can be directly accessed through three different menus: a clickable physical map of the genome, a gene classification list, and a keyword search menu, all of which are accessible from the main page of the database. The entry page for a gene annotation contains the following information: the location of the gene on the genome, the nucleotide and deduced amino acid sequence of the gene, the result of a similarity search, and the classification of the deduced gene product according to its function. This page has reverse-links to the local physical map and gene classification list so that relevant genes can be searched in terms of their location on the genome and their function. In addition, the main page of CyanoBase provides engines for similarity searches between a query sequence and the entire genome sequence and for keyword searches, in addition to numerous links to pages containing related information.
To extend our cDNA project for accumulating basic information on unidentified human genes, we newly determined the sequences of 100 cDNA clones from a set of size-fractionated human adult and fetal brain cDNA libraries, and predicted the coding sequences of the corresponding genes, named KIAA1019 to KIAA1118. The sequencing of these clones revealed that the average size of the inserts and corresponding open reading frames were 5.0 kb and 2.6 kb (880 amino acid residues), respectively. Database search of the predicted amino acid sequences classified 58 predicted gene products into the five functional categories, such as cell signaling/communication, cell structure/motility, nucleic acid management, protein management and cell division. It was also found that, for 34 gene products, homologues were detected in the databases, which were similar in sequence through almost the entire regions. The chromosomal locations of the genes were determined by using human-rodent hybrid panels unless their mapping data were already available in the public databases. The expression profiles of all the genes among 10 human tissues, 8 brain regions (amygdala, corpus callosum, cerebellum, caudate nucleus, hippocampus, substania nigra, subthalamic nucleus, and thalamus), spinal cord, fetal brain and fetal liver were also examined by reverse transcription-coupled polymerase chain reaction, products of which were quantified by enzyme-linked immunosorbent assay.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.