Motivation Metagenomics has revolutionized microbiome research by enabling researchers to characterize the composition of complex microbial communities. Taxonomic profiling is one of the critical steps in metagenomic analyses. Marker genes, which are single-copy and universally found across Bacteria and Archaea, can provide accurate estimates of taxon abundances in the sample. Results We present TIPP2, a marker gene-based abundance profiling method, that combines phylogenetic placement with statistical techniques to control classification precision and recall. TIPP2 includes an updated set of reference packages and several algorithmic improvements over the original TIPP method. We find that TIPP2 provides comparable or better estimates of abundance than other profiling methods (including Bracken, mOTUsv2, and MetaPhlAn2), and strictly dominates other methods when there are under-represented (novel) genomes present in the dataset. Availability and Implementation The code for our method is freely available in open source form at https://github.com/smirarab/sepp/blob/tipp2/README.TIPP.md The code and procedure to create new reference packages for TIPP2 are available at https://github.com/shahnidhi/TIPP_reference_package Supplementary information Not available online.
Motivation Microbial gene catalogs are data structures that organize genes found in microbial communities, providing a reference for standardized analysis of the microbes across samples and studies. Although gene catalogs are commonly used, they have not been critically evaluated for their effectiveness as a basis for metagenomic analyses. Results As a case study, we investigate one such catalog, the Integrated Gene Catalog (IGC), however our observations apply broadly to most gene catalogs constructed to date. We focus on both the approach used to construct this catalog and, on its effectiveness, when used as a reference for microbiome studies. Our results highlight important limitations of the approach used to construct the IGC and call into question the broad usefulness of gene catalogs more generally. We also recommend best practices for the construction and use of gene catalogs in microbiome studies and highlight opportunities for future research. Availability All supporting scripts for our analyses can be found on GitHub: https://github.com/SethCommichaux/IGC.git. The supporting data can be downloaded from: https://obj.umiacs.umd.edu/igc-analysis/IGC_analysis_data.tar.gz. Supplementary information Supplementary data are available at Bioinformatics online.
BackgroundGermline mutations BRCA1 and BRCA2 contribute almost equally in the causation of breast cancer (BC). The type of mutations in the Indian population that cause this condition is largely unknown.PurposeIn this cohort, 79 randomized BC patients were screened for various types of BRCA1 and BRCA2 mutations including frameshift, nonsense, missense, in-frame and splice site types.Materials and methodsThe purified extracted DNA of each referral patient was subjected to Sanger gene sequencing using Codon Code Analyzer and Mutation Surveyor and next-generation sequencing (NGS) methods with Ion torrent software, after appropriate care.ResultsThe data revealed that 35 cases were positive for BRCA1 or BRCA2 (35/79: 44.3%). BRCA2 mutations were higher (52.4%) than BRCA1 mutations (47.6%). Five novel mutations detected in this study were p.pro163 frameshift, p.asn997 frameshift, p.ser148 frameshift and two splice site single-nucleotide polymorphisms (SNPs). Additionally, four nonsense and one in-frame deletion were identified, which all seemed to be pathogenic. Polymorphic SNPs contributed the highest percentage of mutations (72/82: 87.8%) and contributed to pathogenic, likely pathogenic, likely benign, benign and variant of unknown significance (VUS). Young age groups (20–60 years) had a high frequency of germline mutations (62/82;75.6%) in the Indian population.ConclusionThis study suggested that polymorphic SNPs contributed a high percentage of mutations along with five novel types. Younger age groups are prone to having BC with a higher mutational rate. Furthermore, the SNPs detected in exons 10, 11 and 16 of BRCA1 and BRCA2 were higher than those in other exons 2, 3 and 9 polymorphic sites in two germline genes. These may be contributory for BC although missense types are known to be susceptible for cancer depending on the type of amino acid replaced in the protein and associated with pathologic events. Accordingly, appropriate counseling and treatment may be suggested.
The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download at www.gitlab.com/treangenlab/seqscreen.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.