Definition and Independent Validation of a Proteomic-Classifier in Ovarian Cancer

Kasimir‐Bauer, Sabine; Roder, Joanna; Obermayr, Eva; Mahner, Sven; Vergote, Ignace; Loverix, Liselore; Braicu, Elena Ioana; Sehouli, Jalid; Concin, Nicole; Net, Lelia; Röder, Heinrich; Zeillinger, Robert; Aust, Stefanie

doi:10.3390/cancers12092519

Cited by 3 publications

(3 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As detailed below, we find better reproducibility when we include information from both the fine structure and the bumps when determining the feature values for each peak. To maintain a consistent naming convention used in previously published literature [6,7,[11][12][13][14][15], we will use the general term "feature" to refer to the peaks and "feature value" to be the semiquantitative numerical value we calculate to represent the relative abundance of that feature (protein or peptide) within the sample. High-resolution images of a representative unprocessed and processed MALDI-TOF spectrum across the entire acquisition range (m/z = 3 to 30 kDa) is shown in the Supplementary Materials Figure S1.…”

Section: Resultsmentioning

confidence: 99%

“…These highly multiplexed data can be combined into diagnostic tests using machine learning techniques designed to work well in the clinical setting where we generally have more attributes than samples, without overfitting [7,8]. Multiple tests in the area of oncology were developed using this approach [9][10][11][12][13][14][15][16][17][18][19][20][21].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Semi-Quantitative MALDI Measurements of Blood-Based Samples for Molecular Diagnostics

et al. 2022

Self Cite

View full text Add to dashboard Cite

Accurate and precise measurement of the relative protein content of blood-based samples using mass spectrometry is challenging due to the large number of circulating proteins and the dynamic range of their abundances. Traditional spectral processing methods often struggle with accurately detecting overlapping peaks that are observed in these samples. In this work, we develop a novel spectral processing algorithm that effectively detects over 1650 peaks with over 3.5 orders of magnitude in intensity in the 3 to 30 kD m/z range. The algorithm utilizes a convolution of the peak shape to enhance peak detection, and accurate peak fitting to provide highly reproducible relative abundance estimates for both isolated peaks and overlapping peaks. We demonstrate a substantial increase in the reproducibility of the measurements of relative protein abundance when comparing this processing method to a traditional processing method for sample sets run on multiple matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) instruments. By utilizing protein set enrichment analysis, we find a sizable increase in the number of features associated with biological processes compared to previously reported results. The new processing method could be very beneficial when developing high-performance molecular diagnostic tests in disease indications.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Semi-Quantitative MALDI Measurements of Blood-Based Samples for Molecular Diagnostics

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…It has already been observed that certain ML architectures facilitate SV calculations, e.g., tree SHAP [ 14 ]. The additive axioms satisfied by SVs facilitate SV calculations for tests based on ensemble averages, and ML methods based on regularized combinations of small coalitions of features also present the possibilities of exact SV calculations for tests which include large numbers of features [ 36 – 39 ]. Systematic studies of the convergence of sampling-based approximations to exact SV calculations for models with these architectures are underway.…”

Section: Discussionmentioning

confidence: 99%

Explaining multivariate molecular diagnostic tests via Shapley values

Roder

Maguire

Georgantas

et al. 2021

BMC Med Inform Decis Mak

Self Cite

View full text Add to dashboard Cite

Background Machine learning (ML) can be an effective tool to extract information from attribute-rich molecular datasets for the generation of molecular diagnostic tests. However, the way in which the resulting scores or classifications are produced from the input data may not be transparent. Algorithmic explainability or interpretability has become a focus of ML research. Shapley values, first introduced in game theory, can provide explanations of the result generated from a specific set of input data by a complex ML algorithm. Methods For a multivariate molecular diagnostic test in clinical use (the VeriStrat® test), we calculate and discuss the interpretation of exact Shapley values. We also employ some standard approximation techniques for Shapley value computation (local interpretable model-agnostic explanation (LIME) and Shapley Additive Explanations (SHAP) based methods) and compare the results with exact Shapley values. Results Exact Shapley values calculated for data collected from a cohort of 256 patients showed that the relative importance of attributes for test classification varied by sample. While all eight features used in the VeriStrat® test contributed equally to classification for some samples, other samples showed more complex patterns of attribute importance for classification generation. Exact Shapley values and Shapley-based interaction metrics were able to provide interpretable classification explanations at the sample or patient level, while patient subgroups could be defined by comparing Shapley value profiles between patients. LIME and SHAP approximation approaches, even those seeking to include correlations between attributes, produced results that were quantitatively and, in some cases qualitatively, different from the exact Shapley values. Conclusions Shapley values can be used to determine the relative importance of input attributes to the result generated by a multivariate molecular diagnostic test for an individual sample or patient. Patient subgroups defined by Shapley value profiles may motivate translational research. However, correlations inherent in molecular data and the typically small ML training sets available for molecular diagnostic test development may cause some approximation methods to produce approximate Shapley values that differ both qualitatively and quantitatively from exact Shapley values. Hence, caution is advised when using approximate methods to evaluate Shapley explanations of the results of molecular diagnostic tests.

show abstract

SOMAmer reagents and the SomaScan platform: Chemically modified aptamers and their applications in therapeutics, diagnostics, and proteomics

Schneider¹,

Lynch²,

Gelinas³

et al. 2022

RNA Therapeutics

View full text Add to dashboard Cite

Definition and Independent Validation of a Proteomic-Classifier in Ovarian Cancer

Cited by 3 publications

References 56 publications

Semi-Quantitative MALDI Measurements of Blood-Based Samples for Molecular Diagnostics

Semi-Quantitative MALDI Measurements of Blood-Based Samples for Molecular Diagnostics

Explaining multivariate molecular diagnostic tests via Shapley values

SOMAmer reagents and the SomaScan platform: Chemically modified aptamers and their applications in therapeutics, diagnostics, and proteomics

Contact Info

Product

Resources

About