Web scraping or web crawling refers to the procedure of automatic extraction of data from websites using software. It is a process that is particularly important in fields such as Business Intelligence in the modern age. Web scrapping is a technology that allow us to extract structured data from text such as HTML. Web scrapping is extremely useful in situations where data isn’t provided in machine readable format such as JSON or XML. The use of web scrapping to gather data allows us to gather prices in near real time from retail store sites and provide further details, web scrapping can also be used to gather intelligence of illicit businesses such as drug marketplaces in the darknet to provide law enforcement and researchers valuable data such as drug prices and varieties that would be unavailable with conventional methods. It has been found that using a web scraping program would yield data that is far more thorough, accurate, and consistent than manual entry. Based on the result it has been concluded that Web scraping is a highly useful tool in the information age, and an essential one in the modern fields. Multiple technologies are required to implement web scrapping properly such as spidering and pattern matching which are discussed. This paper is looking into what web scraping is, how it works, web scraping stages, technologies, how it relates to Business Intelligence, artificial intelligence, data science, big data, cyber securityو how it can be done with the Python language, some of the main benefits of web scraping, and what the future of web scraping may look like, and a special degree of emphasis is placed on highlighting the ethical and legal issues. Keywords: Web Scraping, Web Crawling, Python Language, Business Intelligence, Data Science, Artificial Intelligence, Big Data, Cloud Computing, Cybersecurity, legal, ethical.
Zika virus (ZIKV), the causative agent of Zika fever in humans, is an RNA virus that belongs to the genus Flavivirus. Currently, there is no approved vaccine for clinical use to combat the ZIKV infection and contain the epidemic. Epitope-based peptide vaccines have a large untapped potential for boosting vaccination safety, cross-reactivity, and immunogenicity. Though many attempts have been made to develop vaccines for ZIKV, none of these have proved to be successful. Epitope-based peptide vaccines can act as powerful alternatives to conventional vaccines due to their low production cost, less reactogenic, and allergenic responses. For designing an effective and viable epitope-based peptide vaccine against this deadly virus, it is essential to select the antigenic T-cell epitopes since epitope-based vaccines are considered safe. The in silico machine-learning-based approach for ZIKV T-cell epitope prediction would save a lot of physical experimental time and efforts for speedy vaccine development compared to in vivo approaches. We hereby have trained a machine-learning-based computational model to predict novel ZIKV T-cell epitopes by employing physicochemical properties of amino acids. The proposed ensemble model based on a voting mechanism works by blending the predictions for each class (epitope or nonepitope) from each base classifier. Predictions obtained for each class by the individual classifier are summed up, and the class with the majority vote is predicted upon. An odd number of classifiers have been used to avoid the occurrence of ties in the voting. Experimentally determined ZIKV peptide sequences data set was collected from Immune Epitope Database and Analysis Resource (IEDB) repository. The data set consists of 3,519 sequences, of which 1,762 are epitopes and 1,757 are nonepitopes. The length of sequences ranges from 6 to 30 meter. For each sequence, we extracted 13 physicochemical features. The proposed ensemble model achieved sensitivity, specificity, Gini coefficient, AUC, precision, F-score, and accuracy of 0.976, 0.959, 0.993, 0.994, 0.989, 0.985, and 97.13%, respectively. To check the consistency of the model, we carried out five-fold cross-validation and an average accuracy of 96.072% is reported. Finally, a comparative analysis of the proposed model with existing methods has been carried out using a separate validation data set, suggesting the proposed ensemble model as a better model. The proposed ensemble model will help predict novel ZIKV vaccine candidates to save lives globally and prevent future epidemic-scale outbreaks.
Clustering is a promising technique to manage network resources efficiently and, in vehicular communications it is used to group vehicles with similar characteristics managed by a selected vehicle called a Cluster Head (CH). Due to the highly dynamic topology in vehicular networks, a CH selection process becomes a challenging task. Thus, this paper presents a new clustering scheme, namely, Efficient Cluster Head Selection (ECHS) scheme to select the most suitable CHs. The proposed ECHS scheme introduces important conditions pertaining to the methods deployed in constructing clusters before starting the CH selection. For instance, based on the ECHS rules the ideal CH is the one that centralizes the cluster. This is because it will remain connected as long as possible with its neighbors. The ECHS scheme also guarantees proper clustering distribution in the network, so that the distance between two consecutive clusters are adjusted carefully. Such conditions are guaranteed to effectively cluster vehicles in the road and make the ECHS scheme works better than its counterpart. Simulation experiments are conducted to examine the performance of the ECHS and the results demonstrate that the ECHS scheme achieves the design objectives in terms of CH lifetime, Cluster Member Lifetime (CML), Packet Loss Ratio(PLR), Overhead for Clustering(OC), Average Packet Delay (APD), and Cluster Number (CN). INDEX TERMS Clustering, Cluster Head, Cluster Gateway Candidate, Vehicular Ad hoc Networks. I. INTRODUCTION Nowadays, shortcomings of the traditional transportation systems are eliminated significantly by employing intelligent Vehicular Ad hoc Networks (VANETs). Due to the rapid development of wireless sensors and Internet of Vehicles (IoVs) [1], VANETs can be integrated with other technologies such as Cloud and Fog computing [2][3][4]. This sort of integration makes VANETs easy to deploy and include more traffic management applications. Basically, communication in VANETs can be divided into two categories, depending on the types of the running applications and requested services. The first one, is Vehicleto-Vehicle (V2V) communication, which is commonly used when vehicles share local traffic information among each other without using infrastructure configuration [5][6][7]. The second one is Vehicle-to-Everything (V2X) communication which combines any type of communication between vehicles and infrastructure nodes such as a RoadSide Unit (RSU), a Fog Node, a Base Station, and a Cloud Center. (V2X) communication usually helps vehicle to collect information about different zones inside a city to deduct traffic congestions and to discover free-congested
The World Health Organization reports that heart disease is the most common cause of death globally, accounting for 17.9 million fatalities annually. The fundamentals of a cure, it is thought, are important symptoms and recognition of the illness. Traditional techniques are facing many challenges, ranging from delayed or unnecessary treatment to incorrect diagnoses, which can affect treatment progress, increase the bill, and give the disease more time to spread and harm the patient’s body. Such errors could be avoided and minimized by employing ML and AI techniques. Many significant efforts have been made in recent years to increase computer-aided diagnosis and detection applications, which is a rapidly growing area of research. Machine learning algorithms are especially important in CAD, which is used to detect patterns in medical data sources and make nontrivial predictions to assist doctors and clinicians in making timely decisions. This study aims to develop multiple methods for machine learning using the UCI set of data based on individuals’ medical attributes to aid in the early detection of cardiovascular disease. Various machine learning techniques are used to evaluate and review the results of the UCI machine learning heart disease dataset. The proposed algorithms had the highest accuracy, with the random forest classifier achieving 96.72% and the extreme gradient boost achieving 95.08%. This will assist the doctor in taking appropriate actions. The proposed technology will only be able to determine whether or not a person has a heart issue. The severity of heart disease cannot be determined using this method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.