Variety and veracity are two distinct characteristics of large-scale and heterogeneous data. It has been a great challenge to efficiently represent and process big data with a unified scheme. In this paper, a unified tensor model is proposed to represent the unstructured, semistructured, and structured data. With tensor extension operator, various types of data are represented as subtensors and then are merged to a unified tensor. In order to extract the core tensor which is small but contains valuable information, an incremental high order singular value decomposition (IHOSVD) method is presented. By recursively applying the incremental matrix decomposition algorithm, IHOSVD is able to update the orthogonal bases and compute the new core tensor. Analyzes in terms of time complexity, memory usage, and approximation accuracy of the proposed method are provided in this paper. A case study illustrates that approximate data reconstructed from the core set containing 18% elements can guarantee 93% accuracy in general. Theoretical analyzes and experimental results demonstrate that the proposed unified tensor model and IHOSVD method are efficient for big data representation and dimensionality reduction.
Machine learning-based anomaly detection approaches have attracted increasing attention in the network intrusion detection community because of their intrinsic capabilities in discovering novel attacks. However, most of today's anomalybased IDSs generate high false positive rates and miss many attacks because of a deficiency in their ability to discriminate attacks from legitimate behaviors. In this paper, we propose an anomaly intrusion detection method using the Combined Strangeness and Isolation measure K-Nearest Neighbors (CSI-KNN) algorithm. The intrusion detection algorithm analyzes different characteristics of network data by employing two measures: strangeness and isolation. Based on these measures, a correlation unit raises intrusion alerts with associated confidence estimates. Multiple CSI-KNN classifiers work in parallel to deal with different types of network services so that the CSI-KNN-based NIDS can work more efficiently than processing all network services together.
As the rapidly growing volume of data are beyond the capabilities of many computing infrastructures, to securely process them on cloud has become a preferred solution which can both utilize the powerful capabilities provided by cloud and protect data privacy. This paper presents an approach to securely decompose the high-order tensor, a mathematical model widely used in big data applications, to a core tensor and a certain number of truncated orthogonal bases. The unstructured, semi-structured, and structured data are represented as low-order sub-tensors which are then encrypted to cipher counterparts using the RLWE-based fully homomorphic encryption scheme. A unified high-order cipher tensor model is constructed on cloud by collecting all the cipher subtensors and embedding them to a base tensor space. The cipher tensor is decomposed through a proposed Lanczos-based algorithm, in which the non-homomorphic square root operation is eliminated. Theoretical analyses of the algorithm in terms of time complexity, memory usage, decomposition accuracy, and data security are provided. Experimental results demonstrate that the proposed approach is feasible and secure to perform high-order tensor decomposition on cloud for big data applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.