Research on big data analytics is entering in the new phase called fast data where multiple gigabytes of data arrive in the big data systems every second. Modern big data systems collect inherently complex data streams due to the volume, velocity, value, variety, variability, and veracity in the acquired data and consequently give rise to the 6Vs of big data. The reduced and relevant data streams are perceived to be more useful than collecting raw, redundant, inconsistent, and noisy data. Another perspective for big data reduction is that the million variables big datasets cause the curse of dimensionality which requires unbounded computational resources to uncover actionable knowledge patterns. This article presents a review of methods that are used for big data reduction. It also presents a detailed taxonomic discussion of big data reduction methods including the network theory, big data compression, dimension reduction, redundancy elimination, data mining, and machine learning methods. In addition, the open research issues pertinent to the big data reduction are also highlighted.
The significant development of Internet applications over the past 10 years has resulted in the rising necessity for the information network to be secured. An intrusion detection system is a fundamental network infrastructure defense that must be able to adapt to the ever-evolving threat landscape and identify new attacks that have low false alarm. Researchers have developed several supervised as well as unsupervised methods from the data mining and machine learning disciplines so that anomalies can be detected reliably. As an aspect of machine learning, deep learning uses a neuron-like structure to learn tasks. A successful deep learning technique method is convolution neural network (CNN); however, it is presently not suitable to detect anomalies. It is easier to identify expected contents within the input flow in CNNs, whereas there are minor differences in the abnormalities compared to the normal content. This suggests that a particular method is required for identifying such minor changes. It is expected that CNNs would learn the features that form the characteristic of the content of an image (flow) rather than variations that are unrelated to the content. Hence, this study recommends a new CNN architecture type known as mean convolution layer (CNN-MCL) that was developed for learning the anomalies’ content features and then identifying the particular abnormality. The recommended CNN-MCL helps in designing a strong network intrusion detection system that includes an innovative form of convolutional layer that can teach low-level abnormal characteristics. It was observed that assessing the proposed model on the CICIDS2017 dataset led to favorable results in terms of real-world application regarding detecting anomalies that are highly accurate and have low false-alarm rate as opposed to other best models.
BackgroundThe drug discovery and development pipeline is a long and arduous process that inevitably hampers rapid drug development. Therefore, strategies to improve the efficiency of drug development are urgently needed to enable effective drugs to enter the clinic. Precision medicine has demonstrated that genetic features of cancer cells can be used for predicting drug response, and emerging evidence suggest that gene-drug connections could be predicted more accurately by exploring the cumulative effects of many genes simultaneously.ResultsWe developed DeSigN, a web-based tool for predicting drug efficacy against cancer cell lines using gene expression patterns. The algorithm correlates phenotype-specific gene signatures derived from differentially expressed genes with pre-defined gene expression profiles associated with drug response data (IC50) from 140 drugs. DeSigN successfully predicted the right drug sensitivity outcome in four published GEO studies. Additionally, it predicted bosutinib, a Src/Abl kinase inhibitor, as a sensitive inhibitor for oral squamous cell carcinoma (OSCC) cell lines. In vitro validation of bosutinib in OSCC cell lines demonstrated that indeed, these cell lines were sensitive to bosutinib with IC50 of 0.8–1.2 μM. As further confirmation, we demonstrated experimentally that bosutinib has anti-proliferative activity in OSCC cell lines, demonstrating that DeSigN was able to robustly predict drug that could be beneficial for tumour control.ConclusionsDeSigN is a robust method that is useful for the identification of candidate drugs using an input gene signature obtained from gene expression analysis. This user-friendly platform could be used to identify drugs with unanticipated efficacy against cancer cell lines of interest, and therefore could be used for the repurposing of drugs, thus improving the efficiency of drug development.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3260-7) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.