BackgroundMetagenomics is the study of genetic materials derived directly from complex microbial samples, instead of from culture. One of the crucial steps in metagenomic analysis, referred to as “binning”, is to separate reads into clusters that represent genomes from closely related organisms. Among the existing binning methods, unsupervised methods base the classification on features extracted from reads, and especially taking advantage in case of the limitation of reference database availability. However, their performance, under various aspects, is still being investigated by recent theoretical and empirical studies. The one addressed in this paper is among those efforts to enhance the accuracy of the classification.ResultsThis paper presents an unsupervised algorithm, called BiMeta, for binning of reads from different species in a metagenomic dataset. The algorithm consists of two phases. In the first phase of the algorithm, reads are grouped into groups based on overlap information between the reads. The second phase merges the groups by using an observation on l-mer frequency distribution of sets of non-overlapping reads. The experimental results on simulated and real datasets showed that BiMeta outperforms three state-of-the-art binning algorithms for both short and long reads (≥700 bp) datasets.ConclusionsThis paper developed a novel and efficient algorithm for binning of metagenomic reads, which does not require any reference database. The software implementing the algorithm and all test datasets mentioned in this paper can be downloaded at http://it.hcmute.edu.vn/bioinfo/bimeta/index.htm.Electronic supplementary materialThe online version of this article (doi:10.1186/s13015-014-0030-4) contains supplementary material, which is available to authorized users.
Structural and optical properties of various shapes of quantum wells (QWs), including rectangular, triangular, trapezoidal, and polygonal ones are investigated. Photoluminescence (PL) measurements show that the highest light emission efficiency and the best reproducibility in the intensity and wavelength are obtained from trapezoidal QWs. The temperature dependence of PL spectra indicates the more localized nature of excitons in the trapezoidal QWs. A plan-view transmission electron microscopy shows that quantum dots (QDs) are formed inside the dislocation loop in trapezoidal QWs. The distribution of QDs in size and composition becomes more uniform with trapezoidal QWs than with rectangular QWs, leading to superior light-emission characteristics. It is suggested that QD engineering and dislocation control are possible, to some extent, by the modulation of the QW shape in InGaN/GaN-based light-emitting devices.
One of the main challenges for researchers to build routing protocols is how to use energy efficiently to extend the lifespan of the whole wireless sensor networks (WSN) because sensor nodes have limited battery power resources. In this work, we propose a Sector Tree-Based clustering routing protocol (STB-EE) for Energy Efficiency to cope with this problem, where the entire network area is partitioned into dynamic sectors (clusters), which balance the number of alive nodes. The nodes in each sector only communicate with their nearest neighbour by constructing a minimum tree based on the Kruskal algorithm and using mixed distance from candidate node to base station (BS) and remaining energy of candidate nodes to determine which node will become the cluster head (CH) in each cluster? By calculating the duration of time in each round for suitability, STB-EE increases the number of data packets sent to the BS. Our simulation results show that the network lifespan using STB-EE can be improved by about 16% and 10% in comparison to power-efficient gathering in sensor information system (PEGASIS) and energy-efficient PEGASIS-based protocol (IEEPB), respectively.
Tóm tắt. Phân loại trình tự là bước quan trọng trong quá trình phân tích dữ liệu metagenomic.Trong khi những phương pháp không có giám sát dựa trên đặc trưng hợp thành chỉ hiệu quả cho xử lý trình tự dài, các phương pháp dựa trên độ phong phú thường được sử dụng cho phân loại trình tự ngắn. Những giải pháp phân loại dựa trên độ phong phú hiện nay thường sử dụng tần số l-mer có độ dài cố định để phân loại trình tự vào các nhóm mà các trình tự trong mỗi nhóm thuộc về các hệ gien (hay loài) có độ phong phú tương tự nhau. Tuy nhiên, hiệu năng phân loại của các giải pháp này rất nhạy cảm với độ dài các l-mer, và chúng gặp khó khăn khi phân loại những trình tự thuộc các hệ gien có độ phong phú thấp vì sự lặp lại của các đoạn l-mer ngắn trong các hệ gien này. Trong bài báo này, một phương pháp đếm mới sử dụng các l-mer có độ dài thay đổi được đề xuất, cho phép giải quyết vấn đề lặp lại của các đoạn l-mer ngắn, nhằm cải tiến độ chính xác của các giải pháp phân loại dựa trên độ phong phú. Phần thực nghiệm cho thấy rằng một giải pháp cải tiến của AbundanceBin (một phương pháp phân loại thường được sử dụng) trong đó phương pháp đề xuất được áp dụng cho độ chính xác cao hơn giải pháp ban đầu. Phần mềm hiện thực cho giải pháp này có thể được tải về tại địa chỉ: http://it.hcmute.edu.vn/bioinfo/MetaSeqBin/index.htm Từ khóa. metagenomics, phân loại trình tự, đếm l-mer, trình tự DNA, giải mã trình tự thế hệ mới.Abstract. The binning of reads is a crucial step in metagenomic data analysis. While unsupervised methods which are based on composition features are only efficient for long reads, genome abundance-based methods are often used in the binning of short reads. Previous abundance-based binning approaches usually use fixed-length l-mer frequencies to separate reads into groups such that reads in each group belong to genomes (or species) of very similar abundances. However, their classification performances are very sensitive to the length of l-mers, and they get difficult to separate reads from low-abundance genomes due to the repeat of short length l-mers in the genomes. In this paper, a new variable-length l-mer counting method is proposed to enable dealing with the short length l-mer repetition for improving the accuracy of abundance-based binning approaches. Computational experiments demonstrate that an improved approach of AbundanceBin (a commonly used binning method) in which the proposed method is applied achieves higher accuracy than the original one. The software implementing the approach can be downloaded at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.