High-throughput data generated by new biotechnologies require specific and adapted statistical treatment in order to be efficiently used in biological studies. In this article, we propose a powerful framework to manage and analyse multi-omics heterogeneous data to carry out an integrative analysis. We have illustrated this using the mixOmics package for R software as it specifically addresses data integration issues. Our work also aims at applying the most recent functionalities of mixOmics to real datasets. Although multi-block integrative methodologies exist, we hope to encourage a more widespread use of such approaches in an operational framework by biologists. We have used natural populations of the model plant Arabidopsis thaliana in this work, but the framework proposed is not limited to this plant and can be deployed whatever the organisms of interest and the biological question may be. Four omics datasets (phenomics, metabolomics, cell wall proteomics and transcriptomics) were collected, analysed and integrated to study the cell wall plasticity of plants exposed to sub-optimal temperature growth conditions. The methodologies presented here start from basic univariate statistics leading to multi-block integration analysis. We have also highlighted the fact that each method, either unsupervised or supervised, is associated with one biological issue. Using this powerful framework enabled us to arrive at novel conclusions on the biological system, which would not have been possible using standard statistical approaches.
The high-throughput data generated by new biotechnologies used in biological studies require specific and adapted statistical treatments. In this work, we propose a novel and powerful framework to manage and analyse multi-omics heterogeneous data to carry out an integrative analysis. We illustrate it using the package mixOmics for the R software as it specifically addresses data integration issues. Our work also aims at confronting the most recent functionalities of mixOmics to real data sets because, even if multi-block integrative methodologies exist, they still have to be used to enlarge our know-how and to provide an operational framework to biologists. Natural populations of the model plantArabidopsis thalianaare employed in this work but the framework proposed is not limited to this plant and can be deployed whatever the organisms of interest and the biological question. Four omics data sets (phenomics, metabolomics, cell wall proteomics and transcriptomics) have been collected, analysed and integrated in order to study the cell wall plasticity of plants exposed to sub-optimal temperature growth conditions. The methodologies presented start from basic univariate statistics and lead to multi-block integration analysis, and we highlight the fact that each method is associated to one biological issue. Using this powerful framework led us to novel biological conclusions that could not have been reached using standard statistical approaches.
In this work, we explore dimensionality reduction techniques for univariate and multivariate time series data. We especially conduct a comparison between wavelet decomposition and convolutional variational autoencoders for dimension reduction. We show that variational autoencoders are a good option for reducing the dimension of high dimensional data like ECG. We make these comparisons on a real world, publicly available, ECG dataset that has lots of variability and use the reconstruction error as the metric. We then explore the robustness of these models with noisy data whether for training or inference. These tests are intended to reflect the problems that exist in real-world time series data and the VAE was robust to both tests.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.