Abstract:In the present work, we quantify the irregularity of different European languages belonging to four linguistic families (Romance, Germanic, Uralic and Slavic) and an artificial language (Esperanto). We modified a well-known method to calculate the approximate and sample entropy of written texts. We find differences in the degree of irregularity between the families and our method, which is based on the search of regularities in a sequence of symbols, and consistently distinguishes between natural and synthetic randomized texts. Moreover, we extended our study to the case where multiple scales are accounted for, such as the multiscale entropy analysis. Our results revealed that real texts have non-trivial structure compared to the ones obtained from randomization procedures.
Gene co-expression networks have become a usual approach to integrate the vast amounts of information coming from gene expression studies in cancer cohorts. The reprogramming of the gene regulatory control and the molecular pathways depending on such control are central to the characterization of the disease, aiming to unveil the consequences for cancer prognosis and therapeutics. There is, however, a multitude of factors which have been associated with anomalous control of gene expression in cancer. In the particular case of co-expression patterns, we have previously documented a phenomenon of loss of long distance co-expression in several cancer types, including breast cancer. Of the many potential factors that may contribute to this phenomenology, copy number variants (CNVs) have been often discussed. However, no systematic assessment of the role that CNVs may play in shaping gene co-expression patterns in breast cancer has been performed to date. For this reason we have decided to develop such analysis. In this study, we focus on using probabilistic modeling techniques to evaluate to what extent CNVs affect the phenomenon of long/short range co-expression in Luminal B breast tumors. We analyzed the co-expression patterns in chromosome 8, since it is known to be affected by amplifications/deletions during cancer development. We found that the CNVs pattern in chromosome 8 of Luminal B network does not alter the co-expression patterns significantly, which means that the co-expression program in this cancer phenotype is not determined by CNV structure. Additionally, we found that region 8q24.3 is highly dense in interactions, as well as region p21.3. The most connected genes in this network belong to those cytobands and are associated with several manifestations of cancer in different tissues. Interestingly, among the most connected genes, we found MAF1 and POLR3D, which may constitute an axis of regulation of gene transcription, in particular for non-coding RNA species. We believe that by advancing on our knowledge of the molecular mechanisms behind gene regulation in cancer, we will be better equipped, not only to understand tumor biology, but also to broaden the scope of diagnostic, prognostic and therapeutic interventions to ultimately benefit oncologic patients.
We present a study of natural language using the recurrence network method. In our approach, the repetition of patterns of characters is evaluated without considering the word structure in written texts from different natural languages. Our dataset comprises 85 ebookseBooks written in 17 different European languages. The similarity between patterns of length m is determined by the Hamming distance and a value r is considered to define a matching between two patterns, i.e., a repetition is defined if the Hamming distance is equal or less than the given threshold value r. In this way, we calculate the adjacency matrix, where a connection between two nodes exists when a matching occurs. Next, the recurrence network is constructed for the texts and some representative network metrics are calculated. Our results show that average values of network density, clustering, and assortativity are larger than their corresponding shuffled versions, while for metrics like such as closeness, both original and random sequences exhibit similar values. Moreover, our calculations show similar average values for density among languages which that belong to the same linguistic family. In addition, the application of a linear discriminant analysis leads to well-separated clusters of family languages based on based on the network-density properties. Finally, we discuss our results in the context of the general characteristics of written texts.
An [Formula: see text]-gram in music is defined as an ordered sequence of [Formula: see text] notes of a melodic sequence [Formula: see text]. [Formula: see text] is calculated as the average of the occurrence probability without self-matches of all [Formula: see text]-grams in [Formula: see text]. Then, [Formula: see text] is compared to the averages Shuff[Formula: see text] and Equip[Formula: see text], calculated from random sequences constructed with the same length and set of symbols in [Formula: see text] either by shuffling a given sequence or by distributing the set of symbols equiprobably. For all [Formula: see text], both [Formula: see text], [Formula: see text], and this differences increases with [Formula: see text] and the number of notes, which proves that notes in musical melodic sequences are chosen and arranged in very repetitive ways, in contrast to random music. For instance, for [Formula: see text] and for all analyzed genres we found that [Formula: see text], while [Formula: see text] and [Formula: see text]. [Formula: see text] of baroque and classical genres are larger than the romantic genre one. [Formula: see text] vs [Formula: see text] is very well fitted to stretched exponentials for all songs. This simple method can be applied to any musical genre and generalized to polyphonic sequences.
Gene co-expression networks are a useful tool in the study of interactions that have allowed the visualization and quantification of diverse phenomena, including the loss of co-expression over long distances in cancerous samples. This characteristic, which could be considered fundamental to cancer, has been widely reported in various types of tumors. Since copy number variations (CNVs) have previously been identified as causing multiple genetic diseases, and gene expression is linked to them, they have often been mentioned as a probable cause of loss of co-expression in cancerous networks. In order to carry out a comparative study of the validity of this statement, we took 477 protein-coding genes from chromosome 8, and the CNVs of 101 genes, also protein-coding, belonging to the 8q24.3 region, a cytoband that is particularly active in the appearance of breast cancer. We created CNVS-conditioned co-expression networks of each of the 101 genes in the 8q24.3 region using conditional mutual information. The study was carried out using the four molecular subtypes of breast cancer (Luminal A, Luminal B, Her2, and Basal), as well as a case corresponding to healthy samples. We observed that in all cancer cases, the measurement of the Kolmogorov-Smirnov statistic shows that there are no significant differences between one and other values of the CNVs for any case. Furthermore, the co-expression interactions are stronger in all cancer subtypes than in the control networks. However, the control network presents a homogeneously distributed set of co-expression interactions, while for cancer networks, the highest interactions are more confined to specific cytobands, in particular 8q24.3 and 8p21.3. With this approach, we demonstrate that despite copy number alterations in the 8q24 region being a common trait in breast cancer, the loss of long-distance co-expression in breast cancer is not determined by CNVs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.