Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.
Text is the main method of communicating information in the digital age. Messages, blogs, news articles, reviews, and opinionated information abounds on the Internet. People commonly purchase products online and post their opinions about purchased items. This feedback is displayed publicly to assist others with their purchasing decisions, creating the need for a mechanism with which to extract and summarize useful information for enhancing the decisionmaking process. Our contribution is to improve the accuracy of extraction by combining different techniques from three major areas, namedData Mining, Natural Language Processing techniques and Ontologies. The proposed framework sequentially mines product's aspects and users' opinions, groups representative aspects by similarity, and generates an output summary. This paper focuses on the task of extracting product aspects and users' opinions by extracting all possible aspects and opinions from reviews using natural language, ontology, and frequent "tag"sets. The proposed framework, when compared with an existing baseline model, yielded promising results.
Background The COVID-19 outbreak has left a destructive trail around the world in terms of deaths, travel restrictions, trade deficits, and an economy that is collapsing, including job losses, real estate, health benefit loss, and a decrease in the quality of access to care and services in almost all sectors, as well as generally in the overall quality of life. The successful development of COVID-19 vaccines may hasten the acceleration of global post-pandemic recovery by vaccinating residents, with a particular focus on important groups, in order to decrease secondary transmission. This will facilitate the easing of enforced restrictions on global and local travel, the tourism industry, education sectors, and other aspects of social life. Vaccinating residents may also help reduce the risk of secondary transmission. The efforts that Saudi Arabia made to control the epidemic were outstanding on all fronts and in all spheres, including the health, education, commerce, and tourism industries, among others. Objective The purpose of this research was to investigate the elements that influence a traveler's decision to acquire and use a digital health passport (DHP), which was introduced by the Tawakkalna application in Saudi Arabia at the COVID-19 conference. Methods The technology acceptance model (TAM) and the information system success model (ISSM) were the primary theoretical frameworks that guided this investigation. The terms “perceived ease of use” (PEOU), “perceived usefulness” (PU), “information quality” (IQ), “service quality” (SQ), and “net benefit” (NB) were applied in order to investigate the user's acceptance and use of the DHP, as well as how it contributes to the facilitation of traveling and public perception toward using the DHP. Results In order to assess the validity of the proposed model and its four assumptions, a survey was sent through social media platforms to get responses from nationals and residents of Saudi Arabia. The SPSS program was used to evaluate a total of 103 replies that were considered valid. Following the completion of the study, the findings revealed that PEOU, PU, IQ, SQ, and NB all had favorable impacts on the use of DHP. Conclusion PEOU, PU, IQ, and SQ have a significant relationship with NB that affects the public's acceptance and use of DHP. This study has established validity and reliability while testing the relationship between the variables suggested in the research model.
Network management and multimedia data mining techniques have a great interest in analyzing and improving the network traffic process. In recent times, the most complex task in Software Defined Network (SDN) is security, which is based on a centralized, programmable controller. Therefore, monitoring network traffic is significant for identifying and revealing intrusion abnormalities in the SDN environment. Consequently, this paper provides an extensive analysis and investigation of the NSL-KDD dataset using five different clustering algorithms: K-means, Farthest First, Canopy, Density-based algorithm, and Exception-maximization (EM), using the Waikato Environment for Knowledge Analysis (WEKA) software to compare extensively between these five algorithms. Furthermore, this paper presents an SDN-based intrusion detection system using a deep learning (DL) model with the KDD (Knowledge Discovery in Databases) dataset. First, the utilized dataset is clustered into normal and four major attack categories via the clustering process. Then, a deep learning method is projected for building an efficient SDN-based intrusion detection system. The results provide a comprehensive analysis and a flawless reasonable study of different kinds of attacks incorporated in the KDD dataset. Similarly, the outcomes reveal that the proposed deep learning method provides efficient intrusion detection performance compared to existing techniques. For example, the proposed method achieves a detection accuracy of 94.21% for the examined dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.