2014
DOI: 10.1093/bioinformatics/btu638
|View full text |Cite
|
Sign up to set email alerts
|

HTSeq—a Python framework to work with high-throughput sequencing data

Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed.Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
12,009
0
14

Year Published

2014
2014
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 17,851 publications
(12,527 citation statements)
references
References 12 publications
4
12,009
0
14
Order By: Relevance
“…Mapping of poly(A)-selected reads was performed with TopHat v. 2.0.8b [77] using default settings. Read counts were generated with HTSeq-count v. 0.6.1p1 [78] and differential expression analysis was performed using DESeq2 package v. 1.8.1 [79]. Coverage of RNA reads were analyzed and visualized with Integrative Genomics viewer [80,81].…”
Section: Methodsmentioning
confidence: 99%
“…Mapping of poly(A)-selected reads was performed with TopHat v. 2.0.8b [77] using default settings. Read counts were generated with HTSeq-count v. 0.6.1p1 [78] and differential expression analysis was performed using DESeq2 package v. 1.8.1 [79]. Coverage of RNA reads were analyzed and visualized with Integrative Genomics viewer [80,81].…”
Section: Methodsmentioning
confidence: 99%
“…DGE (i.e., testing for changes in the overall transcriptional output of a gene) is typically performed by applying a count-based inference method from statistical packages such as edgeR 12 or DESeq2 11 to gene counts obtained by read counting software such as featureCounts 1 , HTSeq-count 2 or functions from the GenomicAlignments 22 R package. A lot has been written about how simple counting approaches are prone to give erroneous results for genes with changes in relative isoform usage, due to the direct dependence of the observed read count on the transcript length 23 .…”
Section: Incorporating Transcript-level Estimates Leads To More Accurmentioning
confidence: 99%
“…Currently, one of the most common approaches is to define a set of non-overlapping targets (typically, genes) and use the number of reads overlapping a target as a measure of its abundance, or expression level. Several software packages have been developed for performing such “simple” counting (e.g., featureCounts 1 and HTSeq-count 2 ). More recently, the field has seen a surge in methods aimed at quantifying the abundances of individual transcripts (e.g., Cufflinks 3 , RSEM 4 , BitSeq 5 , kallisto 6 and Salmon 7 ).…”
Section: Introductionmentioning
confidence: 99%
“…The raw reads were filtered by Seqtk and then mapped to the M. tb H37Rv strain reference sequence (GenBank NC_018143.1) using Bowtie2 (version: 2–2.0.5) [43]. Counting of reads per gene was performed using HTSeq followed by TMM (trimmed mean of M-values) normalization [44,45]. Differentially expressed genes were defined as those with a false discovery rate <0.05 and fold-change >2 using the edgeR software [46].…”
Section: Methodsmentioning
confidence: 99%