2022
DOI: 10.1186/s12911-022-01988-2
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of medical decision support systems (DDX generators) using real medical cases of varying complexity and origin

Abstract: Background Medical decision support systems (CDSSs) are increasingly used in medicine, but their utility in daily medical practice is difficult to evaluate. One variant of CDSS is a generator of differential diagnoses (DDx generator). We performed a feasibility study on three different, publicly available data sets of medical cases in order to identify the frequency in which two different DDx generators provide helpful information (either by providing a list of differential diagnosis or recogni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 27 publications
0
3
0
Order By: Relevance
“…The finding compares favorably with existing differential diagnosis generators. A 2022 study evaluating the performance of 2 such models also using New England Journal of Medicine clinicopathological case conferences found that they identified the correct diagnosis in 58% to 68% of cases; the measure of quality was a simple dichotomy of useful vs not useful. GPT-4 provided a numerically superior mean differential quality score compared with an earlier version of one of these differential diagnosis generators (4.2 vs 3.8) …”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The finding compares favorably with existing differential diagnosis generators. A 2022 study evaluating the performance of 2 such models also using New England Journal of Medicine clinicopathological case conferences found that they identified the correct diagnosis in 58% to 68% of cases; the measure of quality was a simple dichotomy of useful vs not useful. GPT-4 provided a numerically superior mean differential quality score compared with an earlier version of one of these differential diagnosis generators (4.2 vs 3.8) …”
Section: Discussionmentioning
confidence: 99%
“…We used New England Journal of Medicine clinicopathologic conferences. These conferences are challenging medical cases with a final pathological diagnosis that are used for educational purposes; they have been used to evaluate differential diagnosis generators since the 1950s …”
Section: Methodsmentioning
confidence: 99%
“…A current report on the use of natural language tools in neurological care 12 comments on the need for external validation that includes expert opinion to insure accuracy, reliability, and suitability for "real-world applications," yet no prior studies are cited in this regard. A recent study has shown AI with a 64% success rate for including correct diagnosis in the differential list and 39% for the top diagnosis, 13 while another showed diagnostic generators achieved a correct diagnosis in 58 to 68% of cases 14 ; however, no head-to-head studies have been reported. Factors to consider in the improved performance of the diagnostic generator NDx compared with AI and other diagnostic generators include NDx being a specialty-specific diagnostic system in clinical neurology and its use of proprietary diagnostic algorithms.…”
Section: Discussionmentioning
confidence: 99%