BackgroundWith an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck.ResultsWe present the newly released R/Bioconductor package which, together with the earlier released R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the package a set of five visual and six quantitative validation measures are available as well.ConclusionsBy providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/].
Computed tomography (CT) derived ventilation algorithms estimate the apparent voxel volume changes within an inhale/exhale CT image pair. Transformation-based methods compute these estimates solely from the spatial transformation acquired by applying a deformable image registration (DIR) algorithm to the image pair. However, approaches based on finite difference approximations of the transformation’s Jacobian have been shown to be numerically unstable. As a result, transformation-based CT ventilation is poorly reproducible with respect to both DIR algorithm and CT acquisition method. Purpose: We introduce a novel Integrated Jacobian Formulation (IJF) method for estimating voxel volume changes under a DIR recovered spatial transformation. The method is based on computing volume estimates of DIR mapped subregions using the hit-or-miss sampling algorithm for integral approximation. The novel approach allows for regional volume change estimates that 1) respect the resolution of the digital grid and 2) are based on approximations with quantitatively characterized and controllable levels of uncertainty. As such, the IJF method is designed to be robust to variations in DIR solutions and thus overall more reproducible. Methods: Numerically, Jacobian estimates are recovered by solving a simple constrained linear least squares problem that guarantees the recovered global volume change is equal to the global volume change obtained from the inhale and exhale lung segmentation masks. Reproducibility of the IJF method with respect to DIR solution was assessed using the expert-determined landmark point pairs and inhale/exhale phases from 10 4DCTs available on www.dir-lab.com. Reproducibility with respect to CT acquisition was assessed on the 4DCT and 4D cone beam CT (4DCBCT) images acquired for five lung cancer patients prior to radiotherapy. Results: The ten Dir-Lab 4DCT cases were registered twice with the same DIR algorithm, but with different smoothing parameter. Finite difference Jacobian (FDJ) and IFJ images were computed for both solutions. The average spatial errors (300 landmarks per case) for the two DIR solution methods were 0.98 (1.10) and 1.02 (1.11). The average Pearson correlation between the FDJ images computed from the two DIR solutions was 0.83 (0.03), while for the IJF images it was 1.00 (0.00). For inter-modality assessment, the IJF and FDJ images were computed from the 4DCT and 4DCBCT of five patients. The average Pearson correlation of the spatially aligned FDJ images was 0.27 (0.11), while it was 0.77 (0.13) for the IFJ method. Conclusion: The mathematical theory underpinning the IJF method allows for the generation of ventilation images that are 1) computed with respect to DIR spatial accuracy on the digital voxel grid and 2) based on DIR measured subregional volume change estimates acquired with quantifiable and controllable levels of uncertainty. Analyses of the experiments are consistent with the mathematical theory and indicate that IJF ventilation imaging has a higher reproducibility with res...
Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.
BackgroundWe describe the pioneering experience of a Spanish family pursuing the goal of understanding their own personal genetic data to the fullest possible extent using Direct to Consumer (DTC) tests. With full informed consent from the Corpas family, all genotype, exome and metagenome data from members of this family, are publicly available under a public domain Creative Commons 0 (CC0) license waiver. All scientists or companies analysing these data (“the Corpasome”) were invited to return results to the family.MethodsWe released 5 genotypes, 4 exomes, 1 metagenome from the Corpas family via a blog and figshare under a public domain license, inviting scientists to join the crowdsourcing efforts to analyse the genomes in return for coauthorship or acknowldgement in derived papers. Resulting analysis data were compiled via social media and direct email.ResultsHere we present the results of our investigations, combining the crowdsourced contributions and our own efforts. Four companies offering annotations for genomic variants were applied to four family exomes: BIOBASE, Ingenuity, Diploid, and GeneTalk. Starting from a common VCF file and after selecting for significant results from company reports, we find no overlap among described annotations. We additionally report on a gut microbiome analysis of a member of the Corpas family.ConclusionsThis study presents an analysis of a diverse set of tools and methods offered by four DTC companies. The striking discordance of the results mirrors previous findings with respect to DTC analysis of SNP chip data, and highlights the difficulties of using DTC data for preventive medical care. To our knowledge, the data and analysis results from our crowdsourced study represent the most comprehensive exome and analysis for a family quartet using solely DTC data generation to date.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1973-7) contains supplementary material, which is available to authorised users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.