A malicious URL is a link that is created to spread spams, phishing, malware, ransomware, spyware, etc. A user may download malware that can adversely affect the computer by clicking on an infected URL, or might be convinced to provide confidential information to a fraudulent website causing serious losses. These threats must be identified and handled in a decent time and in an effective way. Detection is traditionally done through the blacklist usage method, which relies on keyword matching with previously known malicious domain names stored in a repository. This method is fast and easy to implement, with the advantage of having low false-positive rates regarding previously recognized malicious URLs. However, this method cannot recognize newly created malicious URLs. To solve this problem, many machine-learning models have been used. In this paper, we introduce an effective machine learning approach that uses an ensemble learner algorithm called AdaBoost (Adaptive Boosting), combined with different algorithms that enhance detection. For datasets filtration, we used CfsSubsetEval technique, which is an algorithm that searches for a subset of features that work well together. Datasets were collected from the UNB repository; divided into four categories: spam, phishing, malware, and defacement URLs; combined with benign URLs, dataset content is based on lexical features. The experimental results indicate that the proposed approach was successful in enhancing the detection accuracy of malicious URLs with less false-positive rates for all experimental algorithms.
Delivering a reliable and high-quality software system to client is a big challenge in software development and evolution process. One of the software measures that confirm the quality of the system is the defect density. Practitioners usually need this measure during software development process or during a period of operation to indicate the reliability of software system. However, since predicting defect density before testing the modules is time consuming, managers need to build a prediction model that can help in detecting the defective modules. This process can reduce the testing cost and improve testing resources utilizations. The most intrinsic feature of software defect datasets is the data sparsity in the defect density which might bias the final prediction. Therefore, we use deep learning to build defect density prediction models and handle the inherit challenge of data sparsity in defect density. Deep learning has shown to be effective with sparse data. The constructed model has been evaluated against well-known machine learning methods over 28 public datasets. The obtained results confirmed that the deep learning model is generally more adequate than other machine models over the datasets with high and very high sparsity ratios, and competitive choice when the sparsity ratio is either medium or low INDEX TERMS Defect Density Prediction, Deep Learning, Data Sparsity, Machine Learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.