The identification of phenotypic changes in breast cancer (BC) histopathology on account of corresponding molecular changes is of significant clinical importance in predicting disease outcome. One such example is the presence of lymphocytic infiltration (LI) in histopathology, which has been correlated with nodal metastasis and distant recurrence in HER2+ BC patients. In this paper, we present a computer-aided diagnosis (CADx) scheme to automatically detect and grade the extent of LI in digitized HER2+ BC histopathology. Lymphocytes are first automatically detected by a combination of region growing and Markov random field algorithms. Using the centers of individual detected lymphocytes as vertices, three graphs (Voronoi diagram, Delaunay triangulation, and minimum spanning tree) are constructed and a total of 50 image-derived features describing the arrangement of the lymphocytes are extracted from each sample. A nonlinear dimensionality reduction scheme, graph embedding (GE), is then used to project the high-dimensional feature vector into a reduced 3-D embedding space. A support vector machine classifier is used to discriminate samples with high and low LI in the reduced dimensional embedding space. A total of 41 HER2+ hematoxylin-and-eosin-stained images obtained from 12 patients were considered in this study. For more than 100 three-fold cross-validation trials, the architectural feature set successfully distinguished samples of high and low LI levels with a classification accuracy greater than 90%. The popular unsupervised Varma-Zisserman texton-based classification scheme was used for comparison and yielded a classification accuracy of only 60%. Additionally, the projection of the 50 image-derived features for all 41 tissue samples into a reduced dimensional space via GE allowed for the visualization of a smooth manifold that revealed a continuum between low, intermediate, and high levels of LI. Since it is known that extent of LI in BC biopsy specimens is a prognostic indicator, our CADx scheme will potentially help clinicians determine disease outcome and allow them to make better therapy recommendations for patients with HER2+ BC.
In this paper we present a high-throughput system for detecting regions of carcinoma of the prostate (CaP) in HSs from radical prostatectomies (RPs) using probabilistic pairwise Markov models (PPMMs), a novel type of Markov random field (MRF). At diagnostic resolution a digitized HS can contain 80K×70K pixels -far too many for current automated Gleason grading algorithms to process. However, grading can be separated into two distinct steps: 1) detecting cancerous regions and 2) then grading these regions. The detection step does not require diagnostic resolution and can be performed much more quickly. Thus, we introduce a CaP detection system capable of analyzing an entire digitized whole-mount HS (2×1.75 cm 2 ) in under three minutes (on a desktop computer) while achieving a CaP detection sensitivity and specificity of 0.87 and 0.90, respectively. We obtain this high-throughput by tailoring the system to analyze the HSs at low resolution (8 µm per pixel). This motivates the following algorithm:Step 1) glands are segmented, Step 2) the segmented glands are classified as malignant or benign, and Step 3) the malignant glands are consolidated into continuous regions. The classification of individual glands leverages two features: gland size and the tendency for proximate glands to share the same class. The latter feature describes a spatial dependency which we model using a Markov prior. Typically, Markov priors are expressed as the product of potential functions. Unfortunately, potential functions are mathematical abstractions, and constructing priors through their selection becomes an ad hoc procedure, resulting in simplistic models such as the Potts. Addressing this problem, we introduce PPMMs which formulate priors in terms of probability density functions, allowing the creation of more sophisticated models. To demonstrate the efficacy of our CaP detection system and assess the advantages of using a PPMM prior instead of the Potts, we alternately incorporate both priors into our algorithm and rigorously evaluate system performance, extracting statistics from over 6000 simulations run across 40 RP specimens. Perhaps the most indicative result is as © 2010 Elsevier B.V. All rights reserved. * Corresponding authors jpmonaco@rci.rutgers.edu (James P. Monaco), anantm@rci.rutgers.edu (Anant Madabhushi).Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. NIH Public Access Author ManuscriptMed Image Anal. Author manuscript; available in PMC 2011 August 1.
BackgroundSupervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer. Generating training data for classifiers is problematic, since only domain experts (e.g. pathologists) can correctly label ground truth data. Additionally, digital pathology datasets suffer from the "minority class problem", an issue where the number of exemplars from the non-target class outnumber target class exemplars which can bias the classifier and reduce accuracy. In this paper, we develop a training strategy combining active learning (AL) with class-balancing. AL identifies unlabeled samples that are "informative" (i.e. likely to increase classifier performance) for annotation, avoiding non-informative samples. This yields high accuracy with a smaller training set size compared with random learning (RL). Previous AL methods have not explicitly accounted for the minority class problem in biomedical images. Pre-specifying a target class ratio mitigates the problem of training bias. Finally, we develop a mathematical model to predict the number of annotations (cost) required to achieve balanced training classes. In addition to predicting training cost, the model reveals the theoretical properties of AL in the context of the minority class problem.ResultsUsing this class-balanced AL training strategy (CBAL), we build a classifier to distinguish cancer from non-cancer regions on digitized prostate histopathology. Our dataset consists of 12,000 image regions sampled from 100 biopsies (58 prostate cancer patients). We compare CBAL against: (1) unbalanced AL (UBAL), which uses AL but ignores class ratio; (2) class-balanced RL (CBRL), which uses RL with a specific class ratio; and (3) unbalanced RL (UBRL). The CBAL-trained classifier yields 2% greater accuracy and 3% higher area under the receiver operating characteristic curve (AUC) than alternatively-trained classifiers. Our cost model accurately predicts the number of annotations necessary to obtain balanced classes. The accuracy of our prediction is verified by empirically-observed costs. Finally, we find that over-sampling the minority class yields a marginal improvement in classifier accuracy but the improved performance comes at the expense of greater annotation cost.ConclusionsWe have combined AL with class balancing to yield a general training strategy applicable to most supervised classification problems where the dataset is expensive to obtain and which suffers from the minority class problem. An intelligent training strategy is a critical component of supervised classification, but the integration of AL and intelligent choice of class ratios, as well as the application of a general cost model, will help researchers to plan the training process more quickly and effectively.
No abstract
The demand for personalized health care requires a wide range of diagnostic tools for determining patient prognosis and theragnosis (response to treatment). These tools present us with data that is both multi-modal (imaging and non-imaging) and multi-scale (proteomics, histology). By utilizing the information in these sources concurrently, we expect significant improvement in predicting patient prognosis and theragnosis. However, a prerequisite to realizing this improvement is the ability to effectively and quantitatively combine information from disparate sources. In this paper, we present a general fusion framework (GFF) aimed towards a combined knowledge representation predicting disease recurrence. To the best of our knowledge, GFF represents the first formal attempt to fuse biomedical image and non-image information directly at the data level as opposed to the decision level, thus preserving more subtle contributions in the original data. GFF represents the different data streams in separate embedding spaces via the application of dimensionality reduction (DR). Data fusion is then implemented by combining the individual reduced embedding spaces. A proof of concept example is considered for evaluating the GFF, whereby protein expression measurements from mass spectrometry are combined with histological image signatures to predict prostate cancer (CaP) recurrence in 6 CaP patients, following therapy. Preliminary results suggest that GFF offers an intelligent way to fuse image and non-image data structures for making prognostic and theragnostic predictions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.