Flow cytometry datasets consisting of peripheral blood and bone marrow samples for the evaluation of explainable artificial intelligence methods

Thrun, Michael C.; Hoffmann, Jörg; Röhnert, Maximilian Alexander; Bonin, Malte von; Oelschlägel, Uta; Brendel, Cornelia; Ultsch, Alfred

doi:10.1016/j.dib.2022.108382

Cited by 7 publications

(7 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of Flow XAI for learning WHO lymphoma classes was evaluated on two datasets from two independent diagnostic centers with different compositions of B-cell antigen panels through 19,493 and 638 total cases. Clinical evidence for correct classi cation was collected carefully for both datasets, including genetic and histopathological information, as previously published We conventionally evaluated the performance and computed 100 cross-validation trials with class-balanced 80/20 splits between training and test data because this is considered the standard approach in machine learning and pattern recognition for estimating the average error 29,30,47 .…”

Section: Self-organized Lymphoma Classi Cation Using Swarm Intelligencementioning

confidence: 99%

Trustworthy and Self-explanatory Artificial Intelligence for the Classification of Non-Hodgkin Lymphoma by Immunophenotype

Thrun,

Hoffmann,

Krause

et al. 2024

Preprint

Self Cite

View full text Add to dashboard Cite

Diagnostic immunophenotyping of malignant non-Hodgkin-lymphoma (NHL) by multiparameter flow cytometry (MFC) relies on highly trained physicians. Artificial intelligence (AI) systems have been proposed for this diagnostic task, often requiring more learning examples than are usually available. In contrast, Flow XAI has reduced the number of needed learning data by a factor of 100. It selects and reports diagnostically relevant cell populations and expression patterns in a discernable and clear manner so that immunophenotyping experts can understand the rationale behind the AI’s decisions. A self-organized and unsupervised view of the complex multidimensional MFC data provides information about the immunophenotypic structures in the data. Flow XAIintegrates human expert knowledge into its decision process. It reports a self-competence estimation for each case and delivers human-understandable explanations for its decisions. Flow XAI outperformed comparable AI systems in qualitative and quantitative assessments. This self-explanatory AI system can be used for real-world AI lymphoma immunophenotyping.

show abstract

Section: Self-organized Lymphoma Classi Cation Using Swarm Intelligencementioning

confidence: 99%

Trustworthy and Self-explanatory Artificial Intelligence for the Classification of Non-Hodgkin Lymphoma by Immunophenotype

Thrun,

Hoffmann,

Krause

et al. 2024

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The third dataset contained healthy BM samples and leukemia BM samples because the diagnosis of leukemia based on BM samples is a basic task. For details about the measurement process and the structures in the data, we refer to [68].…”

Section: Data Descriptionmentioning

confidence: 99%

“…The synthetic dataset served as a basic test of the performance of the introduced algorithms. As stated in the data description [68], the flow cytometry data were derived from originally obtained diagnostic sample measurements to obtain acute myeloid leukemia (AML) information at the minimal residual disease (MRD) level (cf. [69,70]).…”

Section: Data Descriptionmentioning

confidence: 99%

“…The Dresden dataset comprised N = 22 sample files for PB and N = 22 samples for BM. Each sample file contained more than 100,000 events for a set of features extensively described in [68]. It should be noted that the goal of the investigated XAI algorithms is not to predict single events within each data file but to predict the class of the data file itself.…”

Section: Marburg and Dresden Datamentioning

confidence: 99%

See 1 more Smart Citation

An Explainable AI System for the Diagnosis of High-Dimensional Biomedical Data

Ultsch,

Hoffmann,

Röhnert

et al. 2024

BioMedInformatics

Self Cite

View full text Add to dashboard Cite

Typical state-of-the-art flow cytometry data samples typically consist of measures of 10 to 30 features of more than 100,000 cell “events”. Artificial intelligence (AI) systems are able to diagnose such data with almost the same accuracy as human experts. However, such systems face one central challenge: their decisions have far-reaching consequences for the health and lives of people. Therefore, the decisions of AI systems need to be understandable and justifiable by humans. In this work, we present a novel explainable AI (XAI) method called algorithmic population descriptions (ALPODS), which is able to classify (diagnose) cases based on subpopulations in high-dimensional data. ALPODS is able to explain its decisions in a form that is understandable to human experts. For the identified subpopulations, fuzzy reasoning rules expressed in the typical language of domain experts are generated. A visualization method based on these rules allows human experts to understand the reasoning used by the AI system. A comparison with a selection of state-of-the-art XAI systems shows that ALPODS operates efficiently on known benchmark data and on everyday routine case data.

show abstract

“…For the present experiments, d = 4 variables including the value of the forward scatter (FS) and cytological makers (CD) called for nondisclosure reasons a, b and d, which were downsampled from originally n = 111,686 cells obtained from 100 patients with chronic lymphocytic leukemia (CLL) and 100 healthy control subjects to n = 3,000 instances. This data set is available in the R library "EDOtrans" as "FACSdata" and consists of a subsample of a larger data set published at https://data.mendeley.com/datasets/jk4dt6wprv/1 (accessed October 12, 2022) [45].…”

Section: Cell Surface Marker Leukemia Data Setmentioning

confidence: 99%

Comparative assessment of projection and clustering method combinations in the analysis of biomedical data

Lötsch

Ultsch

2023

Preprint

Self Cite

View full text Add to dashboard Cite

Background Clustering on projected data is a common component of the analysis of biomedical research datasets. Among projection methods, principal component analysis (PCA) is the most commonly used. It focuses on the dispersion (variance) of the data, whereas clustering attempts to identify concentrations (neighborhoods) within the data. These may be conflicting aims. This report re-evaluates combinations of PCA and other common projection methods with common clustering algorithms. Methods PCA, independent component analysis (ICA), isomap, multidimensional scaling (MDS), and t-distributed stochastic neighborhood embedding (t-SNE) were combined with common clustering algorithms (partitioning: k-means, k-medoids, and hierarchical: single, Ward's, average linkage). Projections and clusterings were assessed visually by tessellating the two-dimensional projection plane with Voronoi cells and calculating common measures of cluster quality. Clustering on projected data was evaluated on nine artificial and five real biomedical datasets. Results None of the combinations always gave correct results in terms of capturing the prior classifications in the projections and clusters. Visual inspection of the results is therefore essential. PCA was never ranked first, but was consistently outperformed or equaled by neighborhood-based methods such as t-SNE or manifold learning techniques such as isomap. Conclusions The results do not support PCA as the standard projection method prior to clustering. Instead, several alternatives with visualization of the projection and clustering results should be compared. A visualization is proposed that uses a combination of Voronoi tessellation of the projection plane according to the clustering with a color coding of the projected data points according to the prior classes. This can be used to find the best combination of data projection and clustering in a given in a given data set.

show abstract

Flow cytometry datasets consisting of peripheral blood and bone marrow samples for the evaluation of explainable artificial intelligence methods

Cited by 7 publications

References 12 publications

Trustworthy and Self-explanatory Artificial Intelligence for the Classification of Non-Hodgkin Lymphoma by Immunophenotype

Trustworthy and Self-explanatory Artificial Intelligence for the Classification of Non-Hodgkin Lymphoma by Immunophenotype

An Explainable AI System for the Diagnosis of High-Dimensional Biomedical Data

Comparative assessment of projection and clustering method combinations in the analysis of biomedical data

Contact Info

Product

Resources

About