We have developed a method to quantify the shape of liver lesions in CT images and to evaluate its performance for retrieval of images with similarly-shaped lesions. We employed a machine learning method to combine several shape descriptors and defined similarity measures for a pair of shapes as a weighted combination of distances calculated based on each feature. We created a dataset of 144 simulated shapes and established several reference standards for similarity and computed the optimal weights so that the retrieval result agrees best with the reference standard. Then we evaluated our method on a clinical database consisting of 79 portal-venous-phase CT liver images, where we derived a reference standard of similarity from radiologists' visual evaluation. Normalized Discounted Cumulative Gain (NDCG) was calculated to compare this ordering with the expected ordering based on the reference standard. For the simulated lesions, the mean NDCG values ranged from 91% to 100%, indicating that our methods for combining features were very accurate in representing true similarity. For the clinical images, the mean NDCG values were still around 90%, suggesting a strong correlation between the computed similarity and the independent similarity reference derived the radiologists.
In January 2016 the U.S. National Library of Medicine announced a challenge competition calling for the development and discovery of high-quality algorithms and software that rank how well consumer images of prescription pills match reference images of pills in its authoritative RxIMAGE collection. This challenge was motivated by the need to easily identify unknown prescription pills both by healthcare personnel and the general public. Potential benefits of this capability include confirmation of the pill in settings where the documentation and medication have been separated, such as in a disaster or emergency; and confirmation of a pill when the prescribed medication changes from brand to generic, or for any other reason the shape and color of the pill change. The data for the competition consisted of two types of images, high quality macro photographs, reference images, and consumer quality photographs of the quality we expect users of a proposed application to acquire. A training dataset consisting of 2000 reference images and 5000 corresponding consumer quality images acquired from 1000 pills was provided to challenge participants. A second dataset acquired from 1000 pills with similar distributions of shape and color was reserved as a segregated testing set. Challenge submissions were required to produce a ranking of the reference images, given a consumer quality image as input. Determination of the winning teams was done using the mean average precision quality metric, with the three winners obtaining mean average precision scores of 0.27, 0.09, and 0.08. In the retrieval results, the correct image was amongst the top five ranked images 43%, 12%, and 11% of the time, out of 5000 query/consumer images. This is an initial promising step towards development of an NLM software system and application-programming interface facilitating pill identification. The training dataset will continue to be freely available online at: http://pir.nlm.nih.gov/challenge/submission.html.
Motivation: A gold standard for perceptual similarity in medical images is vital to content-based image retrieval, but inter-reader variability complicates development. Our objective was to develop a statistical model that predicts the number of readers (N) necessary to achieve acceptable levels of variability. Materials and Methods: We collected 3 radiologists' ratings of the perceptual similarity of 171 pairs of CT images of focal liver lesions rated on a 9-point scale. We modeled the readers' scores as bimodal distributions in additive Gaussian noise and estimated the distribution parameters from the scores using an expectation maximization algorithm. We (a) sampled 171 similarity scores to simulate a ground truth and (b) simulated readers by adding noise, with standard deviation between 0 and 5 for each reader. We computed the mean values of 2-50 readers' scores and calculated the agreement (AGT) between these means and the simulated ground truth, and the inter-reader agreement (IRA), using Cohen's Kappa metric. Results: IRA for the empirical data ranged from =0.41 to 0.66. For between 1.5 and 2.5, IRA between three simulated readers was comparable to agreement in the empirical data. For these values , AGT ranged from =0.81 to 0.91. As expected, AGT increased with N, ranging from =0.83 to 0.92 for N = 2 to 50, respectively, with =2. Conclusion: Our simulations demonstrated that for moderate to good IRA, excellent AGT could nonetheless be obtained. This model may be used to predict the required N to accurately evaluate similarity in arbitrary size datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.