An intrusion detection system (IDS) is one of the most effective ways to secure a network and prevent unauthorized access and security attacks. But due to the lack of adequately labeled network traffic data, researchers have proposed several feature representations models over the past three years. However, these models do not account for feature generalization errors when learning semantic similarity from the data distribution and may degrade the performance of the predictive IDS model. In order to improve the capabilities of IDS in the era of Big Data, there is a constant need to extract the most important features from large-scale and balanced network traffic data. This paper proposes a semi-supervised IDS model that leverages the power of untrained autoencoders to learn latent feature representations from a distribution of input data samples. Further, distance function-based clustering is used to find more compact code vectors to capture the semantic similarity between learned feature sets to minimize reconstruction loss. The proposed scheme provides an optimal feature vector and reduces the dimensionality of features, reducing memory requirements significantly. Multiple test cases on the IoT dataset MQTTIOT2020 are conducted to demonstrate the potential of the proposed model. Supervised machine learning classifiers are implemented using a proposed feature representation mechanism and are compared with shallow classifiers. Finally, the comparative evaluation confirms the efficacy of the proposed model with low false positive rates, indicating that the proposed feature representation scheme positively impacts IDS performance.
Gene Regulatory Networks (GRNs) reconstruction aims to infer relationships of potential regulation among the genes. With the rapid growth of the biotechnology, such as Ribonucleic acid (RNA)-sequencing and gene chip microarray, the generated high-throughput data provide gene–gene interaction relationships with more opportunities based on gene expression data. Several approaches are introduced to reconstruct the GRNs, but low accuracy is a major drawback. Hence, this paper introduces the hybrid distance measure and the Pearson’s correlation coefficient for reconstructing GRN. The hybrid distance, such as Tversky index, Tanimoto similarity, and Minkowski distance, is employed to connect the edges. The asymmetric partial correlation network is introduced for determining two influence functions for every pair, and edge direction is determined among them. However, the direction of edges is unknown usually and seems difficult to be identified based on gene expression data. Thus, it extends the data processing inequality applying in the directed network for removing the transitive interactions. The influence value of every node is calculated for identifying the significant regulator. The performance of the proposed Hybrid Distance_Entropy based GRN Reconstruction method is analyzed in terms of correlation, reconstruction error, precision, and recall, which provides superior results with values 0.9450, 0.00052, 0.9095, and 0.8913 based on dataset-1.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.