Code cloning refers to the duplication of source code. It is the most common way of reusing source code in software development. If a bug is identified in one segment of code, all the similar segments need to be checked for the same bug. Consequently, this cloning process may lead to bug propagation that significantly affects the maintenance cost. By considering this problem, code clone detection (CCD) appears as an active area of research. Consequently, there is a strong need to investigate the latest techniques, trends, and tools in the domain of CCD. Therefore, in this paper, we comprehensively inspect the latest tools and techniques utilized for the detection of code clones. Particularly, a systematic literature review (SLR) is performed to select and investigate 54 studies pertaining to CCD. Consequently, six categories are defined to incorporate the selected studies as per relevance, i.e., textual approaches (12), lexical approaches (8), treebased approaches (3), metric-based approaches (7), semantic approaches (7), and hybrid approaches (17). We identified and analyzed 26 CCD tools, i.e., 13 existing and 13 proposed/developed. Moreover, 62 opensource subject systems whose source code is utilized for the CCD are presented. It is concluded that there exist several studies to detect type1, type2, type3, and type4 clones individually. However, there is a need to develop novel approaches with complete tool support in order to detect all four types of clones collectively. Furthermore, it is also required to introduce more approaches to simplify the development of a program dependency graph (PDG) while dealing with the detection of the type4 clones.INDEX TERMS CCD, SLR, code clone detection, CCD tools, code clone types.
Cardiovascular diseases are considered as the most life-threatening syndromes with the highest mortality rate globally. Over a period of time, they have become very common and are now overstretching the healthcare systems of countries. The major factors of cardiovascular diseases are high blood pressure, family history, stress, age, gender, cholesterol, Body Mass Index (BMI), and unhealthy lifestyle. Based on these factors, researchers have proposed various approaches for early diagnosis. However, the accuracy of proposed techniques and approaches needs certain improvements due to the inherent criticality and life threatening risks of cardiovascular diseases. In this article, a MaLCaDD (Machine Learning based Cardiovascular Disease Diagnosis) framework is proposed for the effective prediction of cardiovascular diseases with high precision. Particularly, the framework first deals with the missing values (via mean replacement technique) and data imbalance (via Synthetic Minority Over-sampling Technique -SMOTE). Subsequently, Feature Importance technique is utilized for feature selection. Finally, an ensemble of Logistic Regression and K-Nearest Neighbor (KNN) classifiers is proposed for prediction with higher accuracy. The validation of framework is performed through three benchmark datasets (i.e. Framingham, Heart Disease and Cleveland) and the accuracies of 99.1%, 98% and 95.5 % are achieved respectively. Finally, the comparative analysis prove that MaLCaDD predictions are more accurate (with reduced set of features) as compared to the existing state-of-the-art approaches. Therefore, MaLCaDD is highly reliable and can be applied in real environment for the early diagnosis of cardiovascular diseases.
The information extraction from unstructured text segments is a complex task. Although manual information extraction often produces the best results, it is harder to manage biomedical data extraction manually because of the exponential increase in data size. Thus, there is a need for automatic tools and techniques for information extraction in biomedical text mining. Relation extraction is a significant area under biomedical information extraction that has gained much importance in the last two decades. A lot of work has been done on biomedical relation extraction focusing on rule-based and machine learning techniques. In the last decade, the focus has changed to hybrid approaches showing better results. This research presents a hybrid feature set for classification of relations between biomedical entities. The main contribution of this research is done in the semantic feature set where verb phrases are ranked using Unified Medical Language System (UMLS) and a ranking algorithm. Support Vector Machine and Naïve Bayes, the two effective machine learning techniques, are used to classify these relations. Our approach has been validated on the standard biomedical text corpus obtained from MEDLINE 2001. Conclusively, it can be articulated that our framework outperforms all state-of-the-art approaches used for relation extraction on the same corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.