The substantial cost reduction and massive production of next-generation sequencing (NGS) data have contributed to the progress in the rapid growth of metagenomics. However, production of the massive amount of data by NGS has revealed the challenges in handling the existing bioinformatics tools related to metagenomics. Therefore, in this research we have investigated an equal set of DNA metagenomics data from palm oil mill effluent (POME) sample using three different freeware bioinformatics pipelines’ websites of metagenomics RAST server (MG-RAST), Integrated Microbial Genomes with Microbiome Samples (IMG/M) and European Bioinformatics Institute (EBI) Metagenomics, in term of the taxonomic assignment and functional analysis. We found that MG-RAST is the quickest among these three pipelines. However, in term of analysis of results, IMG/M provides more variety of phylum with wider percent identities for taxonomical assignment and IMG/M provides the highest carbohydrates, amino acids, lipids, and coenzymes transport and metabolism functional annotation beside the highest in total number of glycoside hydrolase enzymes. Next, in identifying the conserved domain and family involved, EBI Metagenomics would be much more appropriate. All the three bioinformatics pipelines have their own specialties and can be used alternately or at the same time based on the user’s functional preference. ABSTRAK: Pengurangan kos dalam skala besar dan pengeluaran data ‘next-generation sequencing’ (NGS) secara besar-besaran telah menyumbang kepada pertumbuhan pesat metagenomik. Walau bagaimanapun, pengeluaran data dalam skala yang besar oleh NGS telah menimbulkan cabaran dalam mengendalikan alat-alat bioinformatika yang sedia ada berkaitan dengan metagenomik. Justeru itu, dalam kajian ini, kami telah menyiasat satu set data metagenomik DNA yang sama dari sampel effluen kilang minyak sawit dengan menggunakan tiga laman web bioinformatik percuma iaitu dari laman web ‘metagenomics RAST server’ (MG-RAST), ‘Integrated Microbial Genomes with Microbiome Samples’ (IMG/M) dan ‘European Bioinformatics Institute’ (EBI) Metagenomics dari segi taksonomi dan analisis fungsi. Kami mendapati bahawa MG-RAST ialah yang paling cepat di antara ketiga-tiga ‘pipeline’, tetapi mengikut keputusan analisa, IMG/M mengeluarkan maklumat philum yang lebih pelbagai bersama peratus identiti yang lebih luas berbanding yang lain untuk pembahagian taksonomi dan IMG/M juga mempunyai bacaan tertinggi dalam hampir semua anotasi fungsional karbohidrat, amino asid, lipid, dan koenzima pengangkutan dan metabolisma malah juga paling tinggi dalam jumlah enzim hidrolase glikosida. Kemudian, untuk mengenal pasti ‘domain’ terpelihara dan keluarga yang terlibat, EBI metagenomics lebih bersesuaian. Ketiga-tiga saluran ‘bioinformatics pipeline’ mempunyai keistimewaan mereka yang tersendiri dan boleh digunakan bersilih ganti dalam masa yang sama berdasarkan pilihan fungsi penggun.
Metagenomic DNA library from palm oil mill effluent (POME) was constructed and subjected to high-throughput screening to find genes encoding cellulose-and xylan-degrading enzymes. DNA of 30 positive fosmid clones were sequenced with next generation sequencing technology and the raw data (short insert-paired) was analyzed with bioinformatic tools. First, the quality of 64,821,599 reverse and forward sequences of 101 bp length raw data was tested using Fastqc and SOLEXA. Then, raw data filtering was carried out by trimming low quality values and short reads and the vector sequences were removed and again the output was checked and the trimming was repeated until a high quality read sets was obtained. The second step was the de novo assembly of sequences to reconstruct 2900 contigs following de Bruijn graph algorithm. Pre-assembled contigs were arranged in order, the distances between contigs were identified and oriented with SSPACE, where 2139 scaffolds have been reconstructed. 16,386 genes have been identified after gene prediction using Prodigal and putative ID assignment with Blastp vs NR protein. The acceptable strategy to handle metagenomic NGS-data in order to detect known and potentially unknown genes is presented and we showed the computational efficiency of de Bruijn graph algorithm of de novo assembly to 21 bioprospect genes encoding cellulose-degrading enzymes and 6 genes encoding xylan-degrading enzymes of 30.3% to 100% identity percentage.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.