Metagenomics has redefined many areas of microbiology. However, metagenome-assembled genomes (MAGs) are often fragmented, primarily when sequencing was performed with short reads. Recent long-read sequencing technologies promise to improve genome reconstruction. However, the integration of two different sequencing modalities makes downstream analyses complex. We, therefore, developed MUFFIN, a complete metagenomic workflow that uses short and long reads to produce high-quality bins and their annotations. The workflow is written by using Nextflow, a workflow orchestration software, to achieve high reproducibility and fast and straightforward use. This workflow also produces the taxonomic classification and KEGG pathways of the bins and can be further used for quantification and annotation by providing RNA-Seq data (optionally). We tested the workflow using twenty biogas reactor samples and assessed the capacity of MUFFIN to process and output relevant files needed to analyze the microbial community and their function. MUFFIN produces functional pathway predictions and, if provided de novo metatranscript annotations across the metagenomic sample and for each bin. MUFFIN is available on github under GNUv3 licence: https://github.com/RVanDamme/MUFFIN.
Metagenomics has redefined many areas of microbiology. However, metagenome-assembled genomes (MAGs) are often fragmented, primarily when sequencing was performed with short reads. Recent long-read sequencing technologies promise to improve genome reconstruction. However, the integration of two different sequencing modalities makes downstream analyses complex. We, therefore, developed MUFFIN, a complete metagenomic workflow that uses short and long reads to produce high-quality bins and their annotations. The workflow is written by using Nextflow, a workflow orchestration software, to achieve high reproducibility and fast and straightforward use. This workflow also produces the taxonomic classification and KEGG pathways of the bins and can be further used by providing RNA-Seq data (optionally) for quantification and annotation. We tested the workflow using twenty biogas reactor samples and assessed the capacity of MUFFIN to process and output relevant files needed to analyze the microbial community and their function. MUFFIN produces functional pathway predictions and if provided de novo transcript annotations across the metagenomic sample and for each bin.Author SummaryRVD did the development and design of MUFFIN and wrote the first draft; BM and EBR did the critical reading and correction of the manuscript; MH did the critical reading of the manuscript and the general adjustments for the metagenomic workflow; AV did the critical reading of the manuscript and adjustments for the taxonomic classifications. CB supervised the project, did the workflow design, helped with the implementation, and revised the manuscript.
EpiCass and CassavaNet4Dev are collaborative projects funded by the Swedish Research Council between the Swedish University of Agriculture (SLU) and the International Institute of Tropical Agriculture (IITA). The projects aim to investigate the influence of epigenetic changes on agricultural traits such as yield and virus resistance while also providing African students and researchers with advanced bioinformatics training and opportunities to participate in big data analysis events. The first advanced bioinformatics training workshop took place from May 16th to May 18th, 2022, followed by an online mini-symposium titled "Epigenetics and crop improvement" on May 19th. The symposium featured international speakers covering a wide range of topics related to plant epigenetics, cassava viral diseases, and cassava breeding strategies. A new online and on-site teaching concept was developed for the three-day workshop to ensure maximum student participation across Western, Eastern, and Southern Africa. Initially planned in Nigeria, Kenya, Ethiopia, Tanzania, and Zambia, the workshop ultimately focused on Nigeria, Kenya, and Ethiopia due to a lack of qualified candidates in the other countries. Each classroom hosted 20 to 25 students, with at least one bioinformatician present for support. The classrooms were connected via video conferencing, whereas teachers located in different places in Africa and Europe joined the video stream to conduct teaching sessions. The workshop was divided into theoretical classes and hands-on sessions, where participants could run data analysis with support from online teachers and local bioinformaticians. To enable participants to run guided, CPU and RAM-intensive data analysis workflows and overcome local computing and internet access restrictions, a system of virtual machines (VMs) hosted in the cloud was developed. The teaching platform provided teaching and exercise materials to support the use of the VMs. Although some students could not run heavy data analysis workflows due to unforeseen restrictions in the cloud, these issues were solved. All participants had the opportunity to run the analysis steps independently in the cloud using the protocols hosted on the teaching platform.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.