BackgroundMetagenomics is the study of genetic materials derived directly from complex microbial samples, instead of from culture. One of the crucial steps in metagenomic analysis, referred to as “binning”, is to separate reads into clusters that represent genomes from closely related organisms. Among the existing binning methods, unsupervised methods base the classification on features extracted from reads, and especially taking advantage in case of the limitation of reference database availability. However, their performance, under various aspects, is still being investigated by recent theoretical and empirical studies. The one addressed in this paper is among those efforts to enhance the accuracy of the classification.ResultsThis paper presents an unsupervised algorithm, called BiMeta, for binning of reads from different species in a metagenomic dataset. The algorithm consists of two phases. In the first phase of the algorithm, reads are grouped into groups based on overlap information between the reads. The second phase merges the groups by using an observation on l-mer frequency distribution of sets of non-overlapping reads. The experimental results on simulated and real datasets showed that BiMeta outperforms three state-of-the-art binning algorithms for both short and long reads (≥700 bp) datasets.ConclusionsThis paper developed a novel and efficient algorithm for binning of metagenomic reads, which does not require any reference database. The software implementing the algorithm and all test datasets mentioned in this paper can be downloaded at http://it.hcmute.edu.vn/bioinfo/bimeta/index.htm.Electronic supplementary materialThe online version of this article (doi:10.1186/s13015-014-0030-4) contains supplementary material, which is available to authorized users.
The UNSW-NB15 dataset was created by the Australian Cyber Security Centre in 2015 by using the IXIA tool to extract normal behaviors and modern attacks, it includes normal data and 9 types of attacks with 49 features. Previous research results show that the detection of Fuzzers attacks in this dataset gives the lowest classification quality. This paper analyzes and evaluates the performance of using known ensemble techniques such as Bagging, AdaBoost, Stacking, Decorate, Random Forest and Voting to detect FUZZERS attacks on UNSW-NB15 dataset to create models. The experimental results show that the AdaBoost technique with the component classifiers using decision tree for the best classification quality with F-Measure is 96.76% compared to 94.16%, which is the best result obtained by using single classifiers and 96.36% by using the Random Forest technique.
Building a good IDS model from a certain dataset is one of the main tasks in machine learning. Training multiple classifiers at the same time to solve the same problem and then combining their outputs to improve classification quality, called ensemble method. This paper analyzes and evaluates the performance of using known ensemble techniques such as Bagging, AdaBoost, Stacking, Decorate, Random Forest and Voting to detect DoS attacks on UNSW-NB15 dataset, created by the Australian Cyber Security Center 2015. The experimental results show that the Stacking technique with heterogeneous classifiers for the best classification quality with F − Measure is 99.28% compared to 98.61%, which is the best result are obtained by using single classifiers and 99.02% by using the Random Forest technique.
Semantic extraction for images is an urgent problem and is applied in many different semantic retrieval systems. In this paper, a semantic-based image retrieval (SBIR) system is proposed based on the combination of growth partitioning tree (GP-Tree), which was built in the authors' previous work, with a self-organizing map (SOM) network and neighbor graph (called SgGP-Tree) to improve accuracy. For each query image, a similar set of images is retrieved on the SgGP-Tree, and a set of visual words is extracted relying on the classes obtained from mask region-based convolutional neural networks (R-CNN), as the basis for querying semantic of input images on ontology by simple protocol and resource description framework query language (SPARQL) query. The experiment was performed on image datasets, such as ImageCLEF and MS-COCO, with precision values of 0.898453 and 0.875467, respectively. These results are compared with related works on the same image dataset, showing the effectiveness of the methods proposed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.