DNA
replication in eukaryotes is an intricate process, which is
precisely synchronized by a set of regulatory proteins, and the replication
fork emanates from discrete sites on chromatin called origins of replication
(Oris). These spots are considered as the gateway to chromosomal replication
and are stereotyped by sequence motifs. The cognate sequences are
noticeable in a small group of entire origin regions or totally absent
across different metazoans. Alternatively, the use of DNA secondary
structural features can provide additional information compared to
the primary sequence. In this article, we report the trends in DNA
sequence-based structural properties of origin sequences in nine eukaryotic
systems representing different families of life. Biologically relevant
DNA secondary structural properties, namely, stability, propeller
twist, flexibility, and minor groove shape were studied in the sequences
flanking replication start sites. Results indicate that Oris in yeasts
show lower stability, more rigidity, and narrow minor groove preferences
compared to genomic sequences surrounding them. Yeast Oris also show
preference for A-tracts and the promoter element TATA box in the vicinity
of replication start sites. On the contrary, Drosophila
melanogaster, humans, and Arabidopsis
thaliana do not have such features in their Oris,
and instead, they show high preponderance of G-rich sequence motifs
such as putative G-quadruplexes or i-motifs and CpG islands. Our extensive
study applies the DNA structural feature computation to delve into
origins of replication across organisms ranging from yeasts to mammals
and including a plant. Insights from this study would be significant
in understanding origin architecture and help in designing new algorithms
for predicting DNA trans-acting factor recognition events.
The eukaryotic transcription
is orchestrated from a chunk of the
DNA region stated as the core promoter. Multifarious and punctilious
core promoter signals, viz., TATA-box, Inr, BREs,
and Pause Button, are associated with a subset of genes and regulate
their spatiotemporal expression. However, the core promoter architecture
linked with these signals has not been investigated exhaustively for
several species. In this study, we attempted to envisage the adaptive
binding landscape of the transcription initiation machinery as a function
of DNA structure. To this end, we deployed a set of k-mer based DNA structural estimates and regular expression models derived
from experiments, molecular dynamic simulations, and theoretical frameworks,
and high-throughout promoter data sets retrieved from the eukaryotic
promoter database. We categorized protein-coding gene core promoters
based on characteristic motifs at precise locations and analyzed the
B-DNA structural properties and non-B-DNA structural motifs for 15
different eukaryotic genomes. We observed that Inr, BREd, and no-motif
classes display common patterns of DNA sequence and structural environment.
TATA-containing, BREu, and Pause Button classes show a deviant behavior
with the TATA class displaying varied axial and twisting flexibility
while BREu and Pause Button leaned toward G-quadruplex motif enrichment.
Intriguingly, DNA meltability and shape signals are conserved irrespective
of the presence or absence of distinct core promoter motifs in the
majority of species. Altogether, here we delineated the conserved
DNA structural signals associated with several promoter classes that
may contribute to the chromatin configuration, orchestration of transcription
machinery, and DNA duplex melting during the transcription process.
Machine learning, a rapidly evolving field of data analysis, has now become an integral part of life science research. It has been widely utilized for exploring the information encoded by the genome and beyond the genome. In this study, we surveyed the trends of scientific actors and the conceptual structure of machine learning implementation in biomedical research through the published literature retrieved from the PubMed search engine. A longitudinal cohort bibliographic coupling was executed by employing the VOS viewer tool for 4-time periods,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.