A protein structure is represented as a network of residues whereby edges are determined by intramolecular contacts. We introduce inhomogeneity into these networks by assigning each edge a weight that is determined by amino acid pair potentials. Two methodologies are utilized to calculate the average path lengths (APLs) between pairs: to minimize i), the maximum weight in the strong APL, and ii), the total weight in the weak APL. We systematically screen edges that have higher than a cutoff potential and calculate the shortest APLs in these reduced networks, while keeping chain connectivity. Therefore, perturbations introduced at a selected region of the residue network propagate to remote regions only along the nonscreened edges that retain their ability to disseminate the perturbation. The shortest APLs computed from the reduced homogeneous networks with only the strongest few nonbonded pairs closely reproduce the strong APLs from the weighted networks. The rate of change in the APL in the reduced residue network as compared to its randomly connected counterpart remains constant until a lower bound. Upon further link removal, this property shows an abrupt increase toward a random coil behavior. Under different perturbation scenarios, diverse optimal paths emerge for robust residue communication.
SummaryThe precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
BackgroundIn recent years, there is aroused interest in expressing complex systems as networks of interacting nodes. Using descriptors from graph theory, it has been possible to classify many diverse systems derived from social and physical sciences alike. In particular, folded proteins as examples of self-assembled complex molecules have also been investigated intensely using these tools. However, we need to develop additional measures to classify different systems, in order to dissect the underlying hierarchy.Methodology and Principal FindingsIn this study, a general analytical relation for the dependence of nearest neighbor degree correlations on degree is derived. Dependence of local clustering on degree is shown to be the sole determining factor of assortative versus disassortative mixing in networks. The characteristics of networks constructed from spatial atomic/molecular systems exemplified by self-organized residue networks built from folded protein structures and block copolymers, atomic clusters and well-compressed polymeric melts are studied. Distributions of statistical properties of the networks are presented. For these densely-packed systems, assortative mixing in the network construction is found to apply, and conditions are derived for a simple linear dependence.ConclusionsOur analyses (i) reveal patterns that are common to close-packed clusters of atoms/molecules, (ii) identify the type of surface effects prominent in different close-packed systems, and (iii) associate fingerprints that may be used to classify networks with varying types of correlations.
Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.