Background and Objectives: Literature reviews are foundational to understanding medical evidence. With AI tools like ChatGPT, Bing Chat and Bard AI emerging as potential aids in this domain, this study aimed to individually assess their citation accuracy within Nephrology, comparing their performance in providing precise. Materials and Methods: We generated the prompt to solicit 20 references in Vancouver style in each 12 Nephrology topics, using ChatGPT, Bing Chat and Bard. We verified the existence and accuracy of the provided references using PubMed, Google Scholar, and Web of Science. We categorized the validity of the references from the AI chatbot into (1) incomplete, (2) fabricated, (3) inaccurate, and (4) accurate. Results: A total of 199 (83%), 158 (66%) and 112 (47%) unique references were provided from ChatGPT, Bing Chat and Bard, respectively. ChatGPT provided 76 (38%) accurate, 82 (41%) inaccurate, 32 (16%) fabricated and 9 (5%) incomplete references. Bing Chat provided 47 (30%) accurate, 77 (49%) inaccurate, 21 (13%) fabricated and 13 (8%) incomplete references. In contrast, Bard provided 3 (3%) accurate, 26 (23%) inaccurate, 71 (63%) fabricated and 12 (11%) incomplete references. The most common error type across platforms was incorrect DOIs. Conclusions: In the field of medicine, the necessity for faultless adherence to research integrity is highlighted, asserting that even small errors cannot be tolerated. The outcomes of this investigation draw attention to inconsistent citation accuracy across the different AI tools evaluated. Despite some promising results, the discrepancies identified call for a cautious and rigorous vetting of AI-sourced references in medicine. Such chatbots, before becoming standard tools, need substantial refinements to assure unwavering precision in their outputs.