Bioinformatic analysis—such as genome assembly quality assessment, alignment summary statistics, relative synonymous codon usage, file format conversion, and processing and analysis—is integrated into diverse disciplines in the biological sciences. Several command-line pieces of software have been developed to conduct some of these individual analyses, but unified toolkits that conduct all these analyses are lacking. To address this gap, we introduce BioKIT, a versatile command line toolkit that has, upon publication, 42 functions, several of which were community-sourced, that conduct routine and novel processing and analysis of genome assemblies, multiple sequence alignments, coding sequences, sequencing data, and more. To demonstrate the utility of BioKIT, we conducted a comprehensive examination of relative synonymous codon usage across 171 fungal genomes that use alternative genetic codes, showed that the novel metric of gene-wise relative synonymous codon usage can accurately estimate gene-wise codon optimization, evaluated the quality and characteristics of 901 eukaryotic genome assemblies, and calculated alignment summary statistics for 10 phylogenomic data matrices. BioKIT will be helpful in facilitating and streamlining sequence analysis workflows. BioKIT is freely available under the MIT license from GitHub (https://github.com/JLSteenwyk/BioKIT), PyPi (https://pypi.org/project/jlsteenwyk-biokit/), and the Anaconda Cloud (https://anaconda.org/jlsteenwyk/jlsteenwyk-biokit). Documentation, user tutorials, and instructions for requesting new features are available online (https://jlsteenwyk.com/BioKIT).
In industry, the yeast Rhodotorula mucilaginosa is commonly used for the production of carotenoids. The production of carotenoids is important because they are used as natural colorants in food and some carotenoids are precursors of retinol (vitamin A). However, the identification and molecular characterization of the carotenoid pathway/s in species belonging to the genus Rhodotorula is scarce due to the lack of genomic information thus potentially impeding effective metabolic engineering of these yeast strains for improved carotenoid production. In this study, we report the isolation, identification, characterization and the whole nuclear genome and mitogenome sequence of the endophyte R. mucilaginosa RIT389 isolated from Distemonanthus benthamianus, a plant known for its anti-fungal and antibacterial properties and commonly used as chewing sticks. The assembled genome of R. mucilaginosa RIT389 is 19 Mbp in length with an estimated genomic heterozygosity of 9.29%. Whole genome phylogeny supports the species designation of strain RIT389 within the genus in addition to supporting the monophyly of the currently sequenced Rhodotorula species. Further, we report for the first time, the recovery of the complete mitochondrial genome of R. mucilaginosa using the genome skimming approach. The assembled mitogenome is at least 7,000 bases larger than that of Rhodotorula taiwanensis which is largely attributed to the presence of large intronic regions containing open reading frames coding for homing endonuclease from the LAGLIDADG and GIY-YIG families. Furthermore, genomic regions containing the key genes for carotenoid production were identified in R. mucilaginosa RIT389, revealing differences in gene synteny that may play a role in the regulation of the biotechnologically important carotenoid synthesis pathways in yeasts.
The declining cost of performing bacterial whole-genome sequencing (WGS) coupled with the availability of large libraries of sequence data for well-characterized isolates have enabled the application of machine-learning (ML) methods to the development of nonlinear sequence-based predictive models. We tested the ML-based model developed by Next Gen Diagnostics for prediction of cefepime phenotypic susceptibility results in Escherichia coli .
Bioinformatic analysis - such as genome assembly quality assessment, alignment summary statistics, relative synonymous codon usage, paired-end aware quality trimming and filtering of sequencing reads, file format conversion, and processing and analysis - is integrated into diverse disciplines in the biological sciences. Several command-line pieces of software have been developed to conduct some of these individual analyses; however, the lack of a unified toolkit that conducts all these analyses can be a barrier in workflows. To address this obstacle, we introduce BioKIT, a versatile toolkit for the UNIX shell environment with 40 functions, several of which were community-sourced, that conduct routine and novel processing and analysis of genome assemblies, multiple sequence alignments, coding sequences, sequencing data, and more. To demonstrate the utility of BioKIT, we assessed the quality and characteristics of 901 eukaryotic genome assemblies, calculated alignment summary statistics for 10 phylogenomic data matrices, determined relative synonymous codon usage across 171 fungal genomes including those that use alternative genetic codes, and demonstrate that a novel metric, gene-wise relative synonymous codon usage, can accurately estimate gene-wise codon optimization. BioKIT will be helpful in facilitating and streamlining sequence analysis workflows. BioKIT is freely available under the MIT license from GitHub (https://github.com/JLSteenwyk/BioKIT), PyPi (https://pypi.org/project/biokit), and the Anaconda Cloud (https://anaconda.org/JLSteenwyk/biokit). Documentation, user tutorials, and instructions for requesting new features are available online (https://jlsteenwyk.com/BioKIT).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.