[Retracted] Monitoring Cardiovascular Problems in Heart Patients Using Machine Learning

Ahdal, Ahmed Al; Rakhra, Manik; Arslan, Farrukh; Khder, Moaiad Ahmad; Patel, Binit; Rajagopal, Balaji Ramkumar; Jain, Rituraj

doi:10.1155/2023/9738123

“…Both LLaMA and Bard also provided incomplete code frequently, requiring the user to prompt for the completion. It is important to note that while GPT-4’s achieved the highest accuracy of 89% and outperforemed the other models, it is still lower than the state-of-the-art accuracy of 0.96% 5 .…”

Section: Resultsmentioning

confidence: 89%

“…With the rapid advancements in natural language processing, Large Language Models (LLMs) have shown great promise in various domains, including bioinformatics 3,4,5 . These models have the potential to revolutionize the field by assisting researchers in complex data analysis and visualization, search for domain specific information, and generating code for diverse bioinformatics tasks.…”

Section: Discussionmentioning

confidence: 99%

BioLLMBench: A Comprehensive Benchmarking of Large Language Models in Bioinformatics

Sarwal,

Munteanu,

Suhodolschi

et al. 2023

Preprint

View full text Add to dashboard Cite

Large Language Models (LLMs) have shown great promise in their knowledge integration and problem-solving capabilities, but their ability to assist in bioinformatics research has not been systematically evaluated. To bridge this gap, we present BioLLMBench, a novel benchmarking framework coupled with a scoring metric scheme for comprehensively evaluating LLMs in solving bioinformatics tasks. Through BioLLMBench, we conducted a thorough evaluation of 2,160 experimental runs of the three most widely used models, GPT-4, Bard and LLaMA, focusing on 36 distinct tasks within the field of bioinformatics. The tasks come from six key areas of emphasis within bioinformatics that directly relate to the daily challenges and tasks faced by individuals within the field. These areas are domain expertise, mathematical problem-solving, coding proficiency, data visualization, summarizing research papers, and developing machine learning models. The tasks also span across varying levels of complexity, ranging from fundamental concepts to expert-level challenges. Each key area was evaluated using seven specifically designed task metrics, which were then used to conduct an overall evaluation of the LLM’s response. To enhance our understanding of model responses under varying conditions, we implemented a Contextual Response Variability Analysis. Our results reveal a diverse spectrum of model performance, with GPT-4 leading in all tasks except mathematical problem solving. GPT4 was able to achieve an overall proficiency score of 91.3% in domain knowledge tasks, while Bard excelled in mathematical problem-solving with a 97.5% success rate. While GPT-4 outperformed in machine learning model development tasks with an average accuracy of 65.32%, both Bard and LLaMA were unable to generate executable end-to-end code. All models faced considerable challenges in research paper summarization, with none of them exceeding a 40% score in our evaluation using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score, highlighting a significant area for future improvement. We observed an increase in model performance variance when using a new chatting window compared to using the same chat, although the average scores between the two contextual environments remained similar. Lastly, we discuss various limitations of these models and acknowledge the risks associated with their potential misuse.

show abstract

“…With the rapid advancements in natural language processing, Large Language Models (LLMs) have shown great promise in various domains, including bioinformatics 3,4,5 . These models have the potential to revolutionize the field by assisting researchers in complex data analysis and visualization, search for domain specific information, and generating code for diverse bioinformatics tasks.…”

Section: Discussionmentioning

confidence: 99%

BioLLMBench: A Comprehensive Benchmarking of Large Language Models in Bioinformatics

Mangul,

Sarwal,

Munteanu

et al. 2024

Preprint

0

View full text Add to dashboard Cite

Large Language Models (LLMs) have shown great promise in their knowledge integration and problem-solving capabilities, but their ability to assist in bioinformatics research has not been systematically evaluated. To bridge this gap, we present BioLLMBench, a novel benchmarking framework coupled with a scoring metric scheme for comprehensively evaluating LLMs in solving bioinformatics tasks. Through BioLLMBench, we conducted a thorough evaluation of 2,160 experimental runs of the three most widely used models, GPT-4, Bard and LLaMA, focusing on 36 distinct tasks within the field of bioinformatics. The tasks come from six key areas of emphasis within bioinformatics that directly relate to the daily challenges and tasks faced by individuals within the field. These areas are domain expertise, mathematical problem-solving, coding proficiency, data visualization, summarizing research papers, and developing machine learning models. The tasks also span across varying levels of complexity, ranging from fundamental concepts to expert-level challenges. Each key area was evaluated using seven specifically designed task metrics, which were then used to conduct an overall evaluation of the LLM’s response. To enhance our understanding of model responses under varying conditions, we implemented a Contextual Response Variability Analysis. Our results reveal a diverse spectrum of model performance, with GPT-4 leading in all tasks except mathematical problem solving. GPT4 was able to achieve an overall proficiency score of 91.3% in domain knowledge tasks, while Bard excelled in mathematical problem-solving with a 97.5% success rate. While GPT-4 outperformed in machine learning model development tasks with an average accuracy of 65.32%, both Bard and LLaMA were unable to generate executable end-to-end code. All models faced considerable challenges in research paper summarization, with none of them exceeding a 40% score in our evaluation using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score, highlighting a significant area for future improvement. We observed an increase in model performance variance when using a new chatting window compared to using the same chat, although the average scores between the two contextual environments remained similar. Lastly, we discuss various limitations of these models and acknowledge the risks associated with their potential misuse.

show abstract

“…Tan, W. et al [25] The Fetal echocardiography is a non-invasive method for diagnosing CHDs in fetuses. The machine learning approach using artificial neural networks has to minimize the risk of death due to CHD Hussain, L. et al [26] The ANN prediction model was trained on medical data sets and tested for accuracy in predicting the outcome of the newborns suffering from CHD Al Ahdal, A. et al [27] The diagnosis model enabled accurate disease diagnosis by leveraging data from history using machine learning techniques, leading to more accurate predictions on patient's prognosis and treatment options…”

Section: Authors Research Highlightsmentioning

confidence: 99%

“…For more straightforward heart problems, such as very mild stenosis or atrial septal defect, imaging studies such as an echocardiogram can diagnose the issue. Hypoplastic left heart syndrome (HLHS) is one of the most severe forms of CHD, and these children may require several surgeries or a transplant to survive [ 26 , 27 ]. Sometimes these surgeries fail, and the child must receive a heart transplant to live.…”

Section: Introductionmentioning

confidence: 99%

A Cardiac Deep Learning Model (CDLM) to Predict and Identify the Risk Factor of Congenital Heart Disease

Pachiyannan

¹

,

Alsulami

²

,

Alsadie

³

et al. 2023

Diagnostics

10

0

View full text Add to dashboard Cite

Congenital heart disease (CHD) is a critical global public health concern, particularly when it comes to newborn mortality. Low- and middle-income countries face the highest mortality rates due to limited resources and inadequate healthcare access. To address this pressing issue, machine learning presents an opportunity to develop accurate predictive models that can assess the risk of death from CHD. These models can empower healthcare professionals by identifying high-risk infants and enabling appropriate care. Additionally, machine learning can uncover patterns in the risk factors associated with CHD mortality, leading to targeted interventions that prevent or reduce mortality among vulnerable newborns. This paper proposes an innovative machine learning approach to minimize newborn mortality related to CHD. By analyzing data from infants diagnosed with CHD, the model identifies key risk factors contributing to mortality. Armed with this knowledge, healthcare providers can devise customized interventions, including intensified care for high-risk infants and early detection and treatment strategies. The proposed diagnostic model utilizes maternal clinical history and fetal health information to accurately predict the condition of newborns affected by CHD. The results are highly promising, with the proposed Cardiac Deep Learning Model (CDLM) achieving remarkable performance metrics, including a sensitivity of 91.74%, specificity of 92.65%, positive predictive value of 90.85%, negative predictive value of 55.62%, and a miss rate of 91.03%. This research aims to make a significant impact by equipping healthcare professionals with powerful tools to combat CHD-related newborn mortality, ultimately saving lives and improving healthcare outcomes worldwide.

show abstract

[Retracted] Monitoring Cardiovascular Problems in Heart Patients Using Machine Learning

Cited by 25 publications

References 35 publications

BioLLMBench: A Comprehensive Benchmarking of Large Language Models in Bioinformatics

BioLLMBench: A Comprehensive Benchmarking of Large Language Models in Bioinformatics

BioLLMBench: A Comprehensive Benchmarking of Large Language Models in Bioinformatics

A Cardiac Deep Learning Model (CDLM) to Predict and Identify the Risk Factor of Congenital Heart Disease

Contact Info

Product

Resources

About