2021
DOI: 10.1093/bioinformatics/btab168
|View full text |Cite
|
Sign up to set email alerts
|

LSTrAP-Kingdom: an automated pipeline to generate annotated gene expression atlases for kingdoms of life

Abstract: Motivation There are now more than two million RNA sequencing experiments for plants, animals, bacteria and fungi publicly available, allowing us to study gene expression within and across species and kingdoms. However, the tools allowing the download, quality control and annotation of this data for more than one species at a time are currently missing. Results To remedy this, we present the Large-Scale Transcriptomic Analysi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 15 publications
0
9
0
Order By: Relevance
“…However, the challenge of selecting the appropriate RNA-seq accessions for the dataset remains an issue in practice, as previously alluded to in the introduction. While it is possible to devise an approach to select for RNA-seq accessions from multiple experimental conditions and sample types via annotations inferred from uploaded metadata descriptions (as seen in Goh & Mutwil, 2021), these annotations might not be accurate and available for all accessions due to the lack of format standardization and fixed vocabulary imposed on uploaded metadata. In addition, experimental conditions and sample types applicable to describe RNA-seq accessions can differ significantly from species to species, making the development of said approach difficult.…”
Section: Discussionmentioning
confidence: 99%
“…However, the challenge of selecting the appropriate RNA-seq accessions for the dataset remains an issue in practice, as previously alluded to in the introduction. While it is possible to devise an approach to select for RNA-seq accessions from multiple experimental conditions and sample types via annotations inferred from uploaded metadata descriptions (as seen in Goh & Mutwil, 2021), these annotations might not be accurate and available for all accessions due to the lack of format standardization and fixed vocabulary imposed on uploaded metadata. In addition, experimental conditions and sample types applicable to describe RNA-seq accessions can differ significantly from species to species, making the development of said approach difficult.…”
Section: Discussionmentioning
confidence: 99%
“…We constructed a Pearson correlation coefficient (PCC) based co-expression network for all genes expressed in at least one transcriptome library with a TPM of 1 using the pcc.py script of LSTrAP v1.3 (Goh and Mutwil 2021 ). We converted the PCC-based co-expression network into a Highest Reciprocal Rank (HRR) co-expression network using parameters of a maximum HRR of 50 and a PCC cut-off of 0.5 with a second-level neighborhood.…”
Section: Methodsmentioning
confidence: 99%
“…The complete list is available in Data S1. Gene expression matrices were obtained from a previous study from public databases via our pipeline using recommended thresholds of log 10 ‐normalized number of processed reads and percentage of reads pseudo‐aligned to the reference coding sequences (CDS) (Goh & Mutwil, 2021), which removed samples with low number and percentage of mapping reads. Gene expression was quantified as TPM via pseudoalignment to reference CDS of each species, using kallisto (Bray et al., 2016).…”
Section: Methodsmentioning
confidence: 99%
“…While the data is publicly available, the interfaces have not been specifically designed for bulk data acquisition for whole kingdoms. We addressed this by designing a parallel‐processing pipeline to automatically discover, download, and process RNA‐seq data for the whole plant kingdom in less than a month (Goh & Mutwil, 2021), making the provision and maintenance of a kingdom‐wide resource feasible.…”
Section: Introductionmentioning
confidence: 99%