Aim: The purpose of this study was to determine the quality and accessibility of the outputs from a healthcare-specific artificial intelligence (AI) platform for common questions during the perioperative period for a common plastic surgery procedure.
Methods: Doximity GPT (Doximity, San Francisco, CA) and ChatGPT 3.5 (OpenAI, San Francisco, CA) were utilized to search 20 common perioperative patient inquiries regarding breast augmentation. The structure, content, and readability of responses were compared using t -tests and chi-square tests, with P < 0.05 used as the cutoff for significance.
Results: Out of 80 total AI-generated outputs, ChatGPT responses were significantly longer (331 vs. 218 words, P < 0.001). Doximity GPT outputs were structured as a letter from a medical provider to the patient, whereas ChatGPT outputs were a bulleted list. Doximity GPT outputs were significantly more readable by four validated scales: Flesch Kincaid Reading Ease (42.6 vs. 29.9, P < 0.001) and Flesch Kincaid Grade Level (11.4 vs. 14.1 grade, P < 0.001), Coleman-Liau Index (14.9 vs. 17 grade, P < 0.001), and Automated Readability Index (11.3 vs. 14.8 grade, P < 0.001). Regarding content, there was no difference between the two platforms regarding the appropriateness of the topic (99% overall). Medical advice from all outputs was deemed reasonable.
Conclusion: Doximity’s AI platform produces reasonable, accurate information in response to common patient queries. With continued reinforcement learning with human feedback (RLHF), Doximity GPT has the potential to be a useful tool to plastic surgeons and can assist with a range of tasks, such as providing basic information on procedures and writing appeal letters to insurance providers.