We propose ClustMe, a new visual quality measure to rank monochrome scatterplots based on cluster patterns. ClustMe is based on data collected from a human‐subjects study, in which 34 participants judged synthetically generated cluster patterns in 1000 scatterplots. We generated these patterns by carefully varying the free parameters of a simple Gaussian Mixture Model with two components, and asked the participants to count the number of clusters they could see (1 or more than 1). Based on the results, we form ClustMe by selecting the model that best predicts these human judgments among 7 different state‐of‐the‐art merging techniques (Demp). To quantitatively evaluate ClustMe, we conducted a second study, in which 31 human subjects ranked 435 pairs of scatterplots of real and synthetic data in terms of cluster patterns complexity. We use this data to compare ClustMe's performance to 4 other state‐of‐the‐art clustering measures, including the well‐known Clumpiness scagnostics. We found that of all measures, ClustMe is in strongest agreement with the human rankings.
Aims/Introduction: The progression from prediabetes to type 2 diabetes is preventable by lifestyle intervention and/or pharmacotherapy in a large fraction of individuals with prediabetes. Our objective was to develop a risk score to screen for prediabetes in the Middle East, where diabetes prevalence is one of the highest in the world. Materials and Methods: In this cross-sectional, case-control study, we used data of 4,895 controls and 2,373 prediabetic adults obtained from the Qatar Biobank cohort. Significant risk factors were identified by logistic regression and other machine learning methods. The receiver operating characteristic was used to calculate the area under curve, cutoff point, sensitivity, specificity, positive and negative predictive values. The prediabetes risk score was developed from data of Qatari citizens, as well as long-term (≥15 years) residents. Results: The significant risk factors for the Prediabetes Risk Score in Qatar were age, sex, body mass index, waist circumference and blood pressure. The risk score ranges from 0 to 45. The area under the curve of the score was 80% (95% confidence interval 78-83%), and the cutoff point of 16 yielded sensitivity and specificity of 86.2% (95% confidence interval 82.7-89.2%) and 57.9% (95% confidence interval 65.5-71.4%), respectively. Prediabetes Risk Score in Qatar performed equally in Qatari nationals and long-term residents. Conclusions: Prediabetes Risk Score in Qatar is the first prediabetes screening score developed in a Middle Eastern population. It only uses risk factors measured non-invasively, is simple, cost-effective, and can be easily understood by the general public and health providers. Prediabetes Risk Score in Qatar is an important tool for early detection of prediabetes, and can help tremendously in curbing the diabetes epidemic in the region.
BackgroundRecently, large bio-projects dealing with the release of different genomes have transpired. Most of these projects use next-generation sequencing platforms. As a consequence, many de novo assembly tools have evolved to assemble the reads generated by these platforms. Each tool has its own inherent advantages and disadvantages, which make the selection of an appropriate tool a challenging task.ResultsWe have evaluated the performance of frequently used de novo assemblers namely ABySS, IDBA-UD, Minia, SOAP, SPAdes, Sparse, and Velvet. These assemblers are assessed based on their output quality during the assembly process conducted over fungal data. We compared the performance of these assemblers by considering both computational as well as quality metrics. By analyzing these performance metrics, the assemblers are ranked and a procedure for choosing the candidate assembler is illustrated.ConclusionsIn this study, we propose an assessment method for the selection of de novo assemblers by considering their computational as well as quality metrics at the draft genome level. We divide the quality metrics into three groups: g1 measures the goodness of the assemblies, g2 measures the problems of the assemblies, and g3 measures the conservation elements in the assemblies. Our results demonstrate that the assemblers ABySS and IDBA-UD exhibit a good performance for the studied data from fungal genomes in terms of running time, memory, and quality. The results suggest that whole genome shotgun sequencing projects should make use of different assemblers by considering their merits.
Background: Differential expression (DE) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest. Such analysis often provides a wide list of genes that are differentially expressed between two or more groups. In general, identified differentially expressed genes (DEGs) can be subject to further downstream analysis for obtaining more biological insights such as determining enriched functional pathways or gene ontologies. Furthermore, DEGs are treated as candidate biomarkers and a small set of DEGs might be identified as biomarkers using either biological knowledge or datadriven approaches. Methods: In this work, we present a novel approach for identifying biomarkers from a list of DEGs by re-ranking them according to the Minimum Redundancy Maximum Relevance (MRMR) criteria using repeated cross-validation feature selection procedure. Results: Using gene expression profiles for 199 children with sepsis and septic shock, we identify 108 DEGs and propose a 10-gene signature for reliably predicting pediatric sepsis mortality with an estimated Area Under ROC Curve (AUC) score of 0.89. Conclusions: Machine learning based refinement of DE analysis is a promising tool for prioritizing DEGs and discovering biomarkers from gene expression profiles. Moreover, our reported 10-gene signature for pediatric sepsis mortality may facilitate the development of reliable diagnosis and prognosis biomarkers for sepsis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.