2023
DOI: 10.1101/2023.03.02.530868
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Identifying promoter sequence architectures via a chunking-based algorithm using non-negative matrix factorisation

Abstract: Core promoters are stretches of DNA at the beginning of genes that contain information that facilitates the binding of transcription initiation complex. Different functional subsets of genes have core promoters with distinct architectures and characteristic motifs. Some of these motifs inform the selection of transcription start sites (TSS). By discovering motifs with fixed distances from known TSS positions, we could in principle classify promoters into different functional groups.Due to the variability and o… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 51 publications
0
1
0
Order By: Relevance
“…Based on previous extensive research on promoters of other species (reviewed in [45] and elsewhere), we posited that regions 50 bp up- and downstream of the dominant TSS likely contained a PIC-binding sequence. We employed seqArchR [46] , a recently developed software that uses unsupervised approach using non-negative matrix factorization, to cluster promoter sequences based on their motifs at near-fixed distances from a reference point, such as TSS. These clusters are characterized by de novo-identified sequence elements, such as position-specific motifs and the nucleotide composition of the input sequences.…”
Section: Resultsmentioning
confidence: 99%
“…Based on previous extensive research on promoters of other species (reviewed in [45] and elsewhere), we posited that regions 50 bp up- and downstream of the dominant TSS likely contained a PIC-binding sequence. We employed seqArchR [46] , a recently developed software that uses unsupervised approach using non-negative matrix factorization, to cluster promoter sequences based on their motifs at near-fixed distances from a reference point, such as TSS. These clusters are characterized by de novo-identified sequence elements, such as position-specific motifs and the nucleotide composition of the input sequences.…”
Section: Resultsmentioning
confidence: 99%