Clustering is an unsupervised learning method, which groups data points based on similarity, and is used to reveal the underlying structure of data. This computational approach is essential to understanding and visualizing the complex data that are acquired in high-throughput multidimensional biological experiments. Clustering enables researchers to make biological inferences for further experiments. Although a powerful technique, inappropriate application can lead biological researchers to waste resources and time in experimental follow-up. We review common pitfalls identified from the published molecular biology literature and present methods to avoid them. Commonly encountered pitfalls relate to the high-dimensional nature of biological data from high-throughput experiments, the failure to consider more than one clustering method for a given problem, and the difficulty in determining whether clustering has produced meaningful results. We present concrete examples of problems and solutions (clustering results) in the form of toy problems and real biological data for these issues. We also discuss ensemble clustering as an easy-to-implement method that enables the exploration of multiple clustering solutions and improves robustness of clustering solutions. Increased awareness of common clustering pitfalls will help researchers avoid overinterpreting or misinterpreting the results and missing valuable insights when clustering biological data.
The EGF receptor can bind seven different agonist ligands.Although each agonist appears to stimulate the same suite of downstream signaling proteins, different agonists are capable of inducing distinct responses in the same cell. To determine the basis for these differences, we used luciferase fragment complementation imaging to monitor the recruitment of Cbl, CrkL, Gab1, Grb2, PI3K, p52 Shc, p66 Shc, and Shp2 to the EGF receptor when stimulated by the seven EGF receptor ligands. Recruitment of all eight proteins was rapid, dose-dependent, and inhibited by erlotinib and lapatinib, although to differing extents. Comparison of the time course of recruitment of the eight proteins in response to a fixed concentration of each growth factor revealed differences among the growth factors that could contribute to their differing biological effects. Principal component analysis of the resulting data set confirmed that the recruitment of these proteins differed between agonists and also between different doses of the same agonist. Ensemble clustering of the overall response to the different growth factors suggests that these EGF receptor ligands fall into two major groups as follows: (i) EGF, amphiregulin, and EPR; and (ii) betacellulin, TGF␣, and epigen. Heparin-binding EGF is distantly related to both clusters. Our data identify differences in network utilization by different EGF receptor agonists and highlight the need to characterize network interactions under conditions other than high dose EGF.
T-box transcription factors are critical developmental regulators in all multi-cellular organisms, and altered T-box factor activity is associated with a variety of human congenital diseases and cancers. Despite the biological significance of T-box factors, their mechanism of action is not well understood. Here we examine whether SUMOylation affects the function of the C. elegans Tbx2 sub-family T-box factor TBX-2. We have previously shown that TBX-2 interacts with the E2 SUMO-conjugating enzyme UBC-9, and that loss of TBX-2 or UBC-9 produces identical defects in ABa-derived pharyngeal muscle development. We now show that TBX-2 is SUMOylated in mammalian cell assays, and that both UBC-9 interaction and SUMOylation depends on two SUMO consensus sites located in the T-box DNA binding domain and near the TBX-2 C-terminus, respectively. In co-transfection assays, a TBX-2:GAL4 fusion protein represses expression of a 5xGal4:tk:luciferase construct. However, this activity does not require SUMOylation, indicating SUMO is not generally required for TBX-2 repressor activity. In C. elegans, reducing SUMOylation enhances the phenotype of a temperature-sensitive tbx-2 mutant and results in ectopic expression of a gene normally repressed by TBX-2, demonstrating that SUMOylation is important for TBX-2 function in vivo. Finally, we show mammalian orthologs of TBX-2, Tbx2, and Tbx3, can also be SUMOylated, suggesting SUMOylation may be a conserved mechanism controlling T-box factor activity.Electronic supplementary materialThe online version of this article (doi:10.1007/s00018-013-1336-y) contains supplementary material, which is available to authorized users.
BackgroundTranscription factors are thought to regulate the transcription of microRNA genes in a manner similar to that of protein-coding genes; that is, by binding to conventional transcription factor binding site DNA sequences located in or near promoter regions that lie upstream of the microRNA genes. However, in the course of analyzing the genomics of human microRNA genes, we noticed that annotated transcription factor binding sites commonly lie within 70- to 110-nt long microRNA small hairpin precursor sequences.ResultsWe report that about 45% of all human small hairpin microRNA (pre-miR) sequences contain at least one predicted transcription factor binding site motif that is conserved across human, mouse and rat, and this rises to over 75% if one excludes primate-specific pre-miRs. The association is robust and has extremely strong statistical significance; it affects both intergenic and intronic pre-miRs and both isolated and clustered microRNA genes. We also confirmed and extended this finding using a separate analysis that examined all human pre-miR sequences regardless of conservation across species.ConclusionsThe transcription factor binding sites localized within small hairpin microRNA precursor sequences may possibly regulate their transcription. Transcription factors may also possibly bind directly to nascent primary microRNA gene transcripts or small hairpin microRNA precursors and regulate their processing.ReviewersThis article was reviewed by Guillaume Bourque (nominated by Jerzy Jurka), Dmitri Pervouchine (nominated by Mikhail Gelfand), and Yuriy Gusev.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.