Training a neural network to learn other dimensionality reduction removes data size restrictions in bioinformatics and provides a new route to exploring data representations

Sa, Thomas; Rt, Steven; Kn, Robinson; Taylor, Andy; Elia, Efstathios A.; Nikula, Chelsea; Ad, Campbell; Panina, Yulia; Ak, Najumudeen; Murta, Teresa; Yan, Bo; Grabowski, Piotr; Hamm, Grégory; Swales, John M.; Is, Gilmore; Mo, Yuneva; Rj, Goodwin; Barry, Simon T.; Sansom, Owen J.; Takáts, Zoltán; Bunch, Josephine

doi:10.1101/2020.09.03.269555

Cited by 8 publications

(6 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For this reason, people have started exploring supervised ML algorithms such as random forest and unsupervised ML algorithms to analyze MSI data. ToF-SIMS data has been successfully analyzed using various ANNs in the form of self-organizing maps (SOMs) or ANNs in combination with t-distributed stochastic neighbor embedding (t-SNE) . The use of ML algorithms makes it possible to reveal chemical differences in MSI data with much greater ease and less human bias, but MSI in general is behind the curve when it comes to advanced data analysis compared to other fields.…”

Section: Mass Spectrometry Imagingmentioning

confidence: 99%

“…ToF-SIMS data has been successfully analyzed using various ANNs 237 in the form of self-organizing maps (SOMs) 238 or ANNs in combination with t-distributed stochastic neighbor embedding (t-SNE). 239 The use of ML algorithms makes it possible to reveal chemical differences in MSI data with much greater ease and less human bias, but MSI in general is behind the curve when it comes to advanced data analysis compared to other fields. Just like the ToF-SIMS field has been able to learn from the electron microscopy community in terms of sample preparation, it will be necessary for MSI to learn from computer scientists and engineers who have already been applying these techniques for decades.…”

Section: ■ Mass Spectrometry Imagingmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal Imaging Based on Vibrational Spectroscopies and Mass Spectrometry Imaging Applied to Biological Tissue: A Multiscale and Multiomics Review

et al. 2020

View full text Add to dashboard Cite

Section: Mass Spectrometry Imagingmentioning

confidence: 99%

Section: ■ Mass Spectrometry Imagingmentioning

confidence: 99%

Multimodal Imaging Based on Vibrational Spectroscopies and Mass Spectrometry Imaging Applied to Biological Tissue: A Multiscale and Multiomics Review

et al. 2020

View full text Add to dashboard Cite

“…The is, we know that biologically similar regions have similar chemical profiles and similar profiles will be grouped together in the embedded space to form dense regions. This has been observed in a number of studies that use dimensionality reduction of mass spectrometry data [11], [12], [23]- [25]. In the case where the data are homogeneous, clustering is not possible and a test for homogeneity can detect this automatically.…”

Section: Density Based Estimation Of Cluster Numbermentioning

confidence: 98%

“…Methods such as t-distributed stochastic neighbour embedding (t-SNE) [5] are state of the art techniques data reduction and visualisation. However the lack of a known mapping prohibits the application to unseen data [12]. Autoencoders avoid this issue by learning the encoding and decoding transformation during training of the model [13].…”

Section: Introductionmentioning

confidence: 99%

Fast Data Driven Estimation of Cluster Number in Multiplex Images using Embedded Density Outliers

Thomas

2022

2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

View full text Add to dashboard Cite

The usage of chemical imaging technologies is becoming a routine accompaniment to traditional methods in pathology. Significant technological advances have developed these next generation techniques to provide rich, spatially resolved, multidimensional chemical images. The rise of digital pathology has significantly enhanced the synergy of these imaging modalities with optical microscopy and immunohistochemistry, enhancing our understanding of the biological mechanisms and progression of diseases. Techniques such as imaging mass cytometry provide labelled multidimensional (multiplex) images of specific components used in conjunction with digital pathology techniques. These powerful techniques generate a wealth of high dimensional data that create significant challenges in data analysis. Unsupervised methods such as clustering are an attractive way to analyse these data, however, they require the selection of parameters such as the number of clusters. Here we propose a methodology to estimate the number of clusters in an automatic data-driven manner using a deep sparse autoencoder to embed the data into a lower dimensional space. We compute the density of regions in the embedded space, the majority of which are empty, enabling the high density regions (i.e. clusters) to be detected as outliers and provide an estimate for the number of clusters. This framework provides a fully unsupervised and data-driven method to analyse multidimensional data. In this work we demonstrate our method using 45 multiplex imaging mass cytometry datasets. Moreover, our model is trained using only one of the datasets and the learned embedding is applied to the remaining 44 images providing an efficient process for data analysis. Finally, we demonstrate the high computational efficiency of our method which is two orders of magnitude faster than estimating via computing the sum squared distances as a function of cluster number.

show abstract

“…The receptive field defines a convolutional kernel window in these CNN architectures to identify salient mass spectral patterns that depend on the selected size of the receptive field (Behrmann et al, 2018). Fully connected neural networks (FCNN) were applied on MSI data to perform non-linear dimensionality reduction (Thomas et al, 2016;Inglese et al, 2017;Dexter et al, 2020), and we recently applied FCNN-based architecture to capture spatial patterns and learn underlying m/z peaks of interest from large scale MSI data while bypassing conventional preprocessing (Abdelmoula et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

massNet: integrated processing and classification of spatially resolved mass spectrometry data using deep learning for rapid tumor delineation

Abdelmoula

Stopka

et al. 2021

Preprint

View full text Add to dashboard Cite

Motivation: Mass spectrometry imaging (MSI) provides rich biochemical information in a label-free manner and therefore holds promise to substantially impact current practice in disease diagnosis. However, the complex nature of MSI data poses computational challenges in its analysis. The complexity of the data arises from its large size, high dimensionality, and spectral non-linearity. Preprocessing, including peak picking, has been used to reduce raw data complexity, however peak picking is sensitive to parameter selection that, perhaps prematurely, shapes the downstream analysis for tissue classification and ensuing biological interpretation. Results: We propose a deep learning model, massNet, that provides the desired qualities of scalability, non-linearity, and speed in MSI data analysis. This deep learning model was used, without prior preprocessing and peak picking, to classify MSI data from a mouse brain harboring a patient-derived tumor. The massNet architecture established automatically learning of predictive features, and automated methods were incorporated to identify peaks with potential for tumor delineation. The model's performance was assessed using cross-validation, and the results demonstrate higher accuracy and a 174-fold gain in speed compared to the established classical machine learning method, support vector machine. Availability and Implementation: The code is publicly available on GitHub.

show abstract

Training a neural network to learn other dimensionality reduction removes data size restrictions in bioinformatics and provides a new route to exploring data representations

Cited by 8 publications

References 45 publications

Multimodal Imaging Based on Vibrational Spectroscopies and Mass Spectrometry Imaging Applied to Biological Tissue: A Multiscale and Multiomics Review

Multimodal Imaging Based on Vibrational Spectroscopies and Mass Spectrometry Imaging Applied to Biological Tissue: A Multiscale and Multiomics Review

Fast Data Driven Estimation of Cluster Number in Multiplex Images using Embedded Density Outliers

massNet: integrated processing and classification of spatially resolved mass spectrometry data using deep learning for rapid tumor delineation

Contact Info

Product

Resources

About