Context Maritime Surveillance (MS) has received increased attention from a civilian perspective in recent years. Anomaly detection (AD) is one of the many techniques available for improving the safety and security in the MS domain. Maritime authorities utilize various confidential data sources for monitoring the maritime activities; however, a paradigm shift on the Internet has created new sources of data for MS. These newly identified data sources, which provide publicly accessible data, are the open data sources. Taking advantage of the open data sources in addition to the traditional sources of data in the AD process will increase the accuracy of the MS systems. Objectives The goal is to investigate the potential open data as a complementary resource for AD in the MS domain. To achieve this goal, the first step is to identify the applicable open data sources for AD. Then, a framework for AD based on the integration of open and closed data sources is proposed. Finally, according to the proposed framework, an AD system with the ability of using open data sources is developed and the accuracy of the system and the validity of its results are evaluated. Methods In order to measure the system accuracy, an experiment is performed by means of a two stage random sampling on the vessel traffic data and the number of true/false positive and negative alarms in the system is verified. To evaluate the validity of the system results, the system is used for a period of time by the subject matter experts from the Swedish Coastguard. The experts check the detected anomalies against the available data at the Coastguard in order to obtain the number of true and false alarms. Results The experimental outcomes indicate that the accuracy of the system is 99%. In addition, the Coastguard validation results show that among the evaluated anomalies, 64.47% are true alarms, 26.32% are false and 9.21% belong to the vessels that remain unchecked due to the lack of corresponding data in the Coastguard data sources. Conclusions This thesis concludes that using open data as a complementary resource for detecting anomalous behavior in the MS domain is not only feasible but also will improve the efficiency of the surveillance systems by increasing the accuracy and covering some unseen aspects of maritime activities.
The amount of software that hosts spyware has increased dramatically. To avoid legal repercussions, the vendors need to inform users about inclusion of spyware via end user license agreements (EULAs) during the installation of an application. However, this information is intentionally written in a way that is hard for users to comprehend. We investigate how to automatically discriminate between legitimate software and spyware associated software by mining EULAs. For this purpose, we compile a data set consisting of 996 EULAs out of which 9.6% are associated to spyware. We compare the performance of 17 learning algorithms with that of a baseline algorithm on two data sets based on a bag-of-words and a meta data model. The majority of learning algorithms significantly outperform the baseline regardless of which data representation is used. However, a non-parametric test indicates that bag-of-words is more suitable than the meta model. Our conclusion is that automatic EULA classification can be applied to assist users in making informed decisions about whether to install an application without having read the EULA. We therefore outline the design of a spyware prevention tool and suggest how to select suitable learning algorithms for the tool by using a multi-criteria evaluation approach.
Abstract-Machine learning algorithms are frequently applied in data mining applications. Many of the tasks in this domain concern high-dimensional data. Consequently, these tasks are often complex and computationally expensive. This paper presents a GPU-based parallel implementation of the Random Forests algorithm. In contrast to previous work, the proposed algorithm is based on the compute unified device architecture (CUDA). An experimental comparison between the CUDA-based algorithm (CudaRF), and state-of-the-art Random Forests algorithms (FastRF and LibRF) shows that CudaRF outperforms both FastRF and LibRF for the studied classification task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.