Pacific Biosciences Fusion and Long Isoform Pipeline for Cancer Transcriptome–Based Resolution of Isoform Complexity

Miller, Anthony R.; Wijeratne, Saranga; McGrath, Sean; Schieffer, Kathleen M.; Miller, Katherine E.; Lee, Kristy; Mathew, Mariam; LaHaye, Stephanie; Fitch, James; Kelly, Benjamin J.; White, Peter; Mardis, Elaine R.; Wilson, Richard K.; Cottrell, Catherine E.; Magrini, Vincent

doi:10.1016/j.jmoldx.2022.09.003

Cited by 6 publications

(9 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We compared data from PacBio sequencing of three replicates of a PB_FLIC-Seq library (Fig. 1 ) prepared with the HBR_SIRV commercial reference RNA (“HBR_SIRV_Con”) to a previously published HBR_SIRV Iso-Seq dataset (“HBR_SIRV_Non") [ 17 ].

Fig.…”

Section: Resultsmentioning

confidence: 99%

“…6) Samples undergo PacBio SMRTbell prep, sequencing, and PacBio primary analysis including SKERA for read splitting. Deconcatenated reads are used as input into the PacBio Fusion and Long Isoform Pipeline “PB_FLIP” [ 17 ] for isoform characterization and gene fusion detection …”

Section: Resultsmentioning

confidence: 99%

“…This latter observation would indicate a potential issue with incomplete de-concatenation resulting in artificial fusion transcripts. To set a baseline level of total fusions identified and the gene partners involved, we processed the HBR_SIRV_Non data through our long-read fusion detection analytical pipeline, PB_FLIP, [ 17 ]. No fusion transcripts were predicted for one of the HBR_SIRV_Non samples, whereas the other two replicates had a total of seven unique fusion transcripts predicted (Additional file 6 ).…”

Section: Resultsmentioning

confidence: 99%

“…The s-reads for the tumor (9,714,94) and paired non-malignant adjacent tissue (10,547,041) were processed through our long-read analysis pipeline, PB_FLIP [ 17 ], for isoform characterization (Additional files 7 and 8 ). We initially assessed MGMT expression between the two tissue samples.…”

Section: Resultsmentioning

confidence: 99%

“…The Iso-Seq protocol requires no computational assembly of reads, as the method preserves both the full-length expressed exonic order and orientation of each transcript. As such, Iso-Seq data represents native transcripts that can accurately represent novel isoforms [ 17 – 22 ]. The improvement in characterization of expressed isoforms by full-length long-read sequencing comes at the expense of comparatively lower read output per instrument run, which can limit the detection of moderate to lowly-expressed isoforms [ 23 ].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity

Wijeratne,

Gonzalez,

Roach

et al. 2024

BMC Genomics

Self Cite

View full text Add to dashboard Cite

Background Cancers exhibit complex transcriptomes with aberrant splicing that induces isoform-level differential expression compared to non-diseased tissues. Transcriptomic profiling using short-read sequencing has utility in providing a cost-effective approach for evaluating isoform expression, although short-read assembly displays limitations in the accurate inference of full-length transcripts. Long-read RNA sequencing (Iso-Seq), using the Pacific Biosciences (PacBio) platform, can overcome such limitations by providing full-length isoform sequence resolution which requires no read assembly and represents native expressed transcripts. A constraint of the Iso-Seq protocol is due to fewer reads output per instrument run, which, as an example, can consequently affect the detection of lowly expressed transcripts. To address these deficiencies, we developed a concatenation workflow, PacBio Full-Length Isoform Concatemer Sequencing (PB_FLIC-Seq), designed to increase the number of unique, sequenced PacBio long-reads thereby improving overall detection of unique isoforms. In addition, we anticipate that the increase in read depth will help improve the detection of moderate to low-level expressed isoforms. Results In sequencing a commercial reference (Spike-In RNA Variants; SIRV) with known isoform complexity we demonstrated a 3.4-fold increase in read output per run and improved SIRV recall when using the PB_FLIC-Seq method compared to the same samples processed with the Iso-Seq protocol. We applied this protocol to a translational cancer case, also demonstrating the utility of the PB_FLIC-Seq method for identifying differential full-length isoform expression in a pediatric diffuse midline glioma compared to its adjacent non-malignant tissue. Our data analysis revealed increased expression of extracellular matrix (ECM) genes within the tumor sample, including an isoform of the Secreted Protein Acidic and Cysteine Rich (SPARC) gene that was expressed 11,676-fold higher than in the adjacent non-malignant tissue. Finally, by using the PB_FLIC-Seq method, we detected several cancer-specific novel isoforms. Conclusion This work describes a concatenation-based methodology for increasing the number of sequenced full-length isoform reads on the PacBio platform, yielding improved discovery of expressed isoforms. We applied this workflow to profile the transcriptome of a pediatric diffuse midline glioma and adjacent non-malignant tissue. Our findings of cancer-specific novel isoform expression further highlight the importance of long-read sequencing for characterization of complex tumor transcriptomes.

show abstract

Fig.…”

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity

Wijeratne,

Gonzalez,

Roach

et al. 2024

BMC Genomics

Self Cite

View full text Add to dashboard Cite

show abstract

Long-read RNA sequencing: A transformative technology for exploring transcriptome complexity in human diseases

Ament,

DeBruyne,

Wang

et al. 2024

Molecular Therapy

View full text Add to dashboard Cite

SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales

Li,

Wang,

et al. 2024

J. Bioinform. Comput. Biol.

View full text Add to dashboard Cite

Background: Genetic mutations that cause the inactivation or aberrant activation of essential proteins may trigger alterations or even dysfunctions in cellular signaling pathways, culminating in the development of precancerous lesions and cancer. Mutations and such dysfunctions can result in the generation of “novel proteins” that are not part of the conventional human proteome. Identification of these proteins carries a profound potential for unraveling promising drug targets and designing innovative therapeutic models. Despite the emergence of diverse tools for detecting DNA or RNA variants, facilitated by the widespread adoption of nucleotide sequencing technology, these methods primarily target point mutations and exhibit suboptimal performance in detecting large-scale and combinatorial mutations. Additionally, the outcomes of these tools are confined to the genome and transcriptome levels, and do not provide the corresponding protein information resulting from genetic alterations. Results: We present the development of Sequencing Analysis Kit (SAKit), a bioinformatics pipeline for hybrid sequencing analysis integrating long-read and short-read RNA sequencing data. Long reads are utilized for detecting large-scale variations such as gene fusions, exon skipping, intron retention, and aberrant expression in non-coding regions, owing to their excellent coverage capabilities. Short reads serve to validate these findings at breakpoints and splice junctions. Conversely, short reads are employed for identifying small-scale variations, including single nucleotide variants, deletions, and insertions, due to their superior sequencing depth, with long reads providing additional validation. SAKit is designed to perform analyses using inter-species configuration files comprising genome references and annotation data, making it applicable to both human and mouse studies. Furthermore, SAKit implements a hierarchical filtering approach to eliminate low-confidence variants and employs open reading frame (ORF) analysis to translate identified variants into protein sequences. Conclusion: SAKit is a robust and versatile bioinformatics tool designed for the comprehensive identification of both large-scale and small-scale variants from RNA-seq data, facilitating the discovery of novel proteins. This pipeline integrates analysis of long-read and short-read sequencing data, offering a powerful solution for researchers in genomics and transcriptomics. SAKit is freely accessible and open-source, available through GitHub ( https://github.com/therarna/SAKit ) and as a Docker image https://hub.docker.com/repository/docker/therarna ). Implemented primarily within a Snakemake framework using Python, SAKit ensures reproducibility, scalability, and ease of use for the scientific community.

show abstract

Pacific Biosciences Fusion and Long Isoform Pipeline for Cancer Transcriptome–Based Resolution of Isoform Complexity

Cited by 6 publications

References 59 publications

Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity

Full-length isoform concatenation sequencing to resolve cancer transcriptome complexity

Long-read RNA sequencing: A transformative technology for exploring transcriptome complexity in human diseases

SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales

Contact Info

Product

Resources

About