Saffron (Crocus sativus) is a spice with immense economic and medicinal relevance, due to its anticancer and chemopreventive properties. Although the genomic sequence of saffron is not publicly available, the RNA-seq based transcriptome of saffron from Jammu and Kashmir provides several, yet explored, insights into the metagenome of the plant from that region. In the current work, sequence databases were created in the YeATS suite from the NCBI and Ensembl databases to enable faster comparisons. These were used to determine the metagenome of saffron. Soybean mosaic virus, a potyvirus, was found to be abundantly expressed in all five tissues analyzed. Recent studies have highlighted that issues arising from latent potyvirus infections in saffron is severely underestimated. Bacterial and fungal identification is made complex due to symbiogenesis, especially in the absence of the endogenous genome. Symbiogenesis results in transcripts having significant homology to bacterial genomes and eukaryotic genomes. A stringent criterion based on homology comparison was used to identify bacterial and fungal transcripts, and inferences were constrained to the genus level. Leifsonia, Elizabethkingia and Staphylococcus were some of the identified bacteria, while Mycosphaerella and Pyrenophora were among the fungi detected. Among the bacterial genera, L. xyli is the causal agent for ratoon stunting disease in sugarcane, while E. meningoseptica and S. haemolyticus, having acquired multiresistance against available antimicrobial agents, are important in clinical settings. Mycosphaerella and Pyrenophora incorporate several pathogenic species. It is shown that a transcript from heat shock protein of the fungi Cladosporium cladosporioides has been erroneously annotated as a saffron gene. The detection of these pathogens should enable proper strategies for ensuring better yields. The functional annotation of proteins in the absence of a genome is subject to errors due to the existence of significantly homologous proteins in organisms from different branches of life.
Obtaining the bacterial genomes:"ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ assembly summary.txt" provides details of available bacteria genomes. This was parsed, and the first occuring species of a genus was chosen randomly (getBacterialGenomes.csh:n=1355 in Dataset1).Obtaining plant mitochondrial, chloroplast and ribosomal sequences:The NCBI database was queried for : Plants, RefSeq, Mitochondrion/Chloroplast/Ribosomal rRNA/Ribosomal mRNA. These were combined in a single file (DB.list CHLORO MITO MRIBO RRIBO, n=40049 in Dataset1).
Obtaining plant mitochondrial, chloroplast and ribosomal sequences:The fungal sequences were obtained from the Ensembl site [16]. A random species was chosen for each genus (list.ensembl.fungi.txt in Dataset1, n=222).The YeATS suite was used extensively to query these databases using the BLAST command-line interface [17]. The BLAST bitscore was used as a comparison metric instead of the Evalue since it allows differentiation for high ho...