28Comparative transcriptome analysis is the comparison of expression patterns between homologous genes 29 in different species. Since most molecular mechanistic studies in plants have been performed in model 30 species including Arabidopsis and rice, comparative transcriptome analysis is particularly important for 31 functional annotation of genes in other plant species. Many biological processes, such as embryo 32 development, are highly conserved between different plant species. The challenge is to establish one-to-33 one mapping of the developmental stages between two species. In this protocol, we solve this problem by 34 converting the gene expression patterns into a co-expression network and then apply network module-35 finding algorithms to the cross-species co-expression network. We describe how to perform such analysis 36 using bash scripts for preliminary data processing and R programming language, which implemented 37 simulated annealing method for module finding. We also provide instructions on how to visualize the 38 resulting co-expression networks across species. 39 40 Keywords 41 Comparative transcriptome analysis, Network, Sequence homology, Arabidopsis, Soybean, Emybro 42 development 43 44 65 seed embryo expression data from Arabidopsis [22] with data from the same tissue in soybean [23] as a 66 demonstration of how to apply computational tools to comparative transcriptome analysis. 67 68 In contrast with the time course data examined here, many other datasets have been reported from 69 "treatment-control" experiments (one time point only, two treatment conditions). For example, soybean 70 roots were treated with drought stress in one experiment [4]. To address the question of functional 71 conservation versus functional divergence within gene families, these soybean root data can be compared 72 with transcriptome data from Arabidopsis roots, under a similar stress [24]. This is a relatively simple 73 problem, because, in both experiments, we can identify lists of differentially expressed genes in response 74to the same or similar treatments. It is a simple two-step process to identify conserved co-expressed genes 75 for treatment-control experiments. First, one needs to identify a list of gene pairs that are homologous 76 between these two species. A simple BLAST search or other more sophisticated approaches such as OMA, 77 EggNog, or Plaza [9,10,12] can be used to identify homologous genes. Second, the two lists of 78 4 differentially expressed genes can be compared to find whether any pairs of these homologous genes 79 appear in both lists.
81In this article, we are focusing on a more complex scenario: two time-series experiments were 82 performed for the same developmental process in two different species [25]. Time course data provide 83 more data points than simple treatment-control experiments and, thus, can reveal relationships based on 84 development between homologous genes in two organisms. However, this is also challenging, because 85 the number of time points in the two experiments ...