Background: The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited.
Accurate flood mapping is important for both planning activity during emergencies and as a support for the successive assessment of damaged areas. A valuable information source for such a procedure can be remote sensing synthetic aperture radar (SAR) imagery. However, flood scenarios are typical examples of complex situations in which different factors have to be considered to provide accurate and robust interpretation of the situation on the ground. For this reason, a data fusion approach of remote sensing data with ancillary information can be particularly useful. In this work, a Bayesian Network (BN) is proposed to integrate remotely sensed data, such as multi-temporal SAR intensity images and InSAR coherence data, with geomorphic and other ground information. The methodology is tested on a case study regarding a flood occurred in the Basilicata region (Italy) on December 2013, monitored using a time series of COSMO-SkyMed data. It is shown that the synergetic use of different information layers can help to detect more precisely the areas affected by the flood, reducing false alarms and missed identifications which may affect algorithms based on data from a single source. The produced flood maps are compared to data obtained independently from the analysis of optical images; the comparison indicates that the proposed methodology is able to reliably follow the temporal evolution of the phenomenon, assigning high probability to areas most likely to be flooded, in spite of their heterogeneous temporal SAR/InSAR signatures, reaching accuracies of up to 89%
Background: In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia -Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.