Introduction: During the last few years, we have witnessed a surge in the utilization of Large Language Models (LLMs) for diverse applications in clinical medicine. Their utility extends to enhancing ECG interpretation, data analysis, and risk prediction in cardiology. This study aims to evaluate the accuracy of LLMs in answering cardiology-specific questions of various difficulty levels. Methods: This study undertakes a comparative analysis of three state of the art LLMs: Google Bard, GPT-3.5 Turbo, and GPT-4.0, against four distinct sets of clinical scenarios of increasing complexity. The clinical scenarios encompass a range of cardiovascular topics, from prevention to acute illness management and complex pathologies. The responses generated by the LLMs were evaluated for their clinical relevance and appropriateness, considering variations in patient demographics. The evaluations were conducted by an experienced panel of cardiologists. Results: All models showed an understanding of medical terminology, but the application of this knowledge varied. GPT-4.0 outperforms Google Bard and GPT-3.5 Turbo across a spectrum of cardiology-related clinical scenarios, demonstrating a strong understanding of medical terminology, contextual understanding and most proficiently aligning its responses with current guidelines. Limitations were seen in the models' abilities to reference ongoing clinical trials, demonstrating a need for real-time clinical data integration. Conclusion: LLMs showed promising results in ability to interpret and apply complex clinical guidelines, with a potential for enhancing patient outcomes through personalized advice. However, they do not supersede human expertise and should be utilized with a grain of salt, as supplementary tools in clinical medicine.