Stijn Meganck scite author profile

Genomic data integration is a key goal to be achieved towards large-scale genomic data analysis. This process is very challenging due to the diverse sources of information resulting from genomics experiments. In this work, we review methods designed to combine genomic data recorded from microarray gene expression (MAGE) experiments. It has been acknowledged that the main source of variation between different MAGE datasets is due to the so-called 'batch effects'. The methods reviewed here perform data integration by removing (or more precisely attempting to remove) the unwanted variation associated with batch effects. They are presented in a unified framework together with a wide range of evaluation tools, which are mandatory in assessing the efficiency and the quality of the data integration process. We provide a systematic description of the MAGE data integration methodology together with some basic recommendation to help the users in choosing the appropriate tools to integrate MAGE data for large-scale analysis; and also how to evaluate them from different perspectives in order to quantify their efficiency. All genomic data used in this study for illustration purposes were retrieved from InSilicoDB http://insilico.ulb.ac.be.

show abstract

A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis

Lazar

Taminau

Meganck

et al. 2012

IEEE/ACM Trans. Comput. Biol. and Bioinf.

536

233

View full text Add to dashboard Cite

A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.

show abstract

Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages

et al. 2012

View full text Add to dashboard Cite

BackgroundWith an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck.ResultsWe present the newly released R/Bioconductor package which, together with the earlier released R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the package a set of five visual and six quantitative validation measures are available as well.ConclusionsBy providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/].

show abstract

Comparing the Predictive Accuracy of Case Linkage Methods in Serious Sexual Assaults

Winter

Lemeire

Meganck

et al. 2012

Journal Invest Psychology

View full text Add to dashboard Cite

The empirical support for linkage analysis is steadily increasing, but the question remains as to what method of linking is the most effective. We compared a more theory‐based, dimensional behavioural approach with a rather pragmatic, multivariate behavioural approach with regard to their accuracy in linking serial sexual assaults in a UK sample of serial sexual assaults (n = 90) and one‐off sexual assaults (n = 129). Their respective linkage accuracy was assessed by (1) using seven dimensions derived by non‐parametric Mokken scale analysis (MSA) as predictors in discriminant function analysis (DFA) and (2) 46 crime scene characteristics simultaneously in a naive Bayesian classifier (NBC). The dimensional scales predicted 28.9% of the series correctly, whereas the NBC correctly identified 34.5% of the series. However, a subsequent inclusion of non‐serial offences in the target group decreased the amount of correct links in the dimensional approach (MSA–DFA: 8.9%; NBC: 32.2%). Receiver operating characteristic analysis was used as a more objective comparison of the two methods under both conditions, confirming that each achieved good accuracies (AUCs = .74–.89), but the NBC performed significantly better than the dimensional approach. The consequences for the practical implementation in behavioural case linkage are discussed. Copyright © 2012 John Wiley & Sons, Ltd.

show abstract

Comparison of Merging and Meta-Analysis as Alternative Approaches for Integrative Gene Expression Analysis

Taminau

Lazar

Meganck

et al. 2014

ISRN Bioinformatics

View full text Add to dashboard Cite

An increasing amount of microarray gene expression data sets is available through public repositories. Their huge potential in making new findings is yet to be unlocked by making them available for large-scale analysis. In order to do so it is essential that independent studies designed for similar biological problems can be integrated, so that new insights can be obtained. These insights would remain undiscovered when analyzing the individual data sets because it is well known that the small number of biological samples used per experiment is a bottleneck in genomic analysis. By increasing the number of samples the statistical power is increased and more general and reliable conclusions can be drawn. In this work, two different approaches for conducting large-scale analysis of microarray gene expression data—meta-analysis and data merging—are compared in the context of the identification of cancer-related biomarkers, by analyzing six independent lung cancer studies. Within this study, we investigate the hypothesis that analyzing large cohorts of samples resulting in merging independent data sets designed to study the same biological problem results in lower false discovery rates than analyzing the same data sets within a more conservative meta-analysis approach.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stijn Meganck

Batch effect removal methods for microarray gene expression data integration: a survey

A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis

Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages

Comparing the Predictive Accuracy of Case Linkage Methods in Serious Sexual Assaults

Comparison of Merging and Meta-Analysis as Alternative Approaches for Integrative Gene Expression Analysis

Contact Info

Product

Resources

About