Newborn screening (NBS) for inborn metabolic disorders is a highly successful public health program that by design is accompanied by false-positive results. Here we trained a Random Forest machine learning classifier on screening data to improve prediction of true and false positives. Data included 39 metabolic analytes detected by tandem mass spectrometry and clinical variables such as gestational age and birth weight. Analytical performance was evaluated for a cohort of 2777 screen positives reported by the California NBS program, which consisted of 235 confirmed cases and 2542 false positives for one of four disorders: glutaric acidemia type 1 (GA-1), methylmalonic acidemia (MMA), ornithine transcarbamylase deficiency (OTCD), and very long-chain acyl-CoA dehydrogenase deficiency (VLCADD). Without changing the sensitivity to detect these disorders in screening, Random Forest-based analysis of all metabolites reduced the number of false positives for GA-1 by 89%, for MMA by 45%, for OTCD by 98%, and for VLCADD by 2%. All primary disease markers and previously reported analytes such as methionine for MMA and OTCD were among the top-ranked analytes. Random Forest’s ability to classify GA-1 false positives was found similar to results obtained using Clinical Laboratory Integrated Reports (CLIR). We developed an online Random Forest tool for interpretive analysis of increasingly complex data from newborn screening.
Microhaplotypes (MH) are comprised of multiple single nucleotide polymorphisms (SNPs) that are located within 300 bases of genomic sequence. Improved tools are needed to facilitate broader application of microhaplotypes in a diverse range of populations and forensic settings. We designed an assay for multiplex sequencing of 90 microhaplotypes (mMHseq) that include 46 MH loci with high Effective Number of Alleles (A e) from previous studies [1], and 44 high A e MH loci containing between four to fourteen SNPs that were identified from the 1000 Genomes (1KG) Project. The unique design of mMHseq integrates a novel method for multiplex amplification from small DNA amounts, and multiplex sequencing of 48 samples in a single MiSeq run to detect all relevant MH variation. Assay performance was evaluated in a cohort of 156 individuals from seven different world populations from Africa, Asia, and Europe. Three of those populations from East Africa (Chagga, Sandawe, and Zaramo) and one from Eastern Europe (Adygei) had sufficient individuals sequenced by the assay to be included in statistical analyses with the 26 1KG populations. For those 30 populations the mean global average A e was 5.08 (range: 2.7-11.54) and mean informativeness for biogeographic variation (I n) was 0.30 (range: 0.08-0.70). Eighty-five novel SNPs were detected in 58 of the 90 microhaplotypes. Open-source, web-based software was developed to visualize haplotype phase data for each microhaplotype and individual. Our approach for multiplex microhaplotype sequencing can be customized and expanded as novel loci are being discovered.
Newborn screening (NBS) programmes utilise information on a variety of clinical variables such as gestational age, sex, and birth weight to reduce false‐positive screens for inborn metabolic disorders. Here we study the influence of ethnicity on metabolic marker levels in a diverse newborn population. NBS data from screen‐negative singleton babies (n = 100 000) were analysed, which included blood metabolic markers measured by tandem mass spectrometry and ethnicity status reported by the parents. Metabolic marker levels were compared between major ethnic groups (Asian, Black, Hispanic, White) using effect size analysis, which controlled for group size differences and influence from clinical variables. Marker level differences found between ethnic groups were correlated to NBS data from 2532 false‐positive cases for four metabolic diseases: glutaric acidemia type 1 (GA‐1), methylmalonic acidemia (MMA), ornithine transcarbamylase deficiency (OTCD), and very long‐chain acyl‐CoA dehydrogenase deficiency (VLCADD). In the result, 79% of the metabolic markers (34 of 43) had ethnicity‐related differences. Compared to the other groups, Black infants had elevated GA‐1 markers (C5DC, Cohen's d = .37, P < .001), Hispanics had elevated MMA markers (C3, Cohen's d = .13, P < .001, and C3/C2, Cohen's d = .27, P < .001); and Whites had elevated VLCADD markers (C14, Cohen's d = .28, P < .001, and C14:1, Cohen's d = .22, P < .001) and decreased OTCD markers (citrulline, Cohen's d = −.26, P < .001). These findings correlated with the higher false‐positive rates in Black infants for GA‐1, in Hispanics for MMA, and in Whites for OTCD and for VLCADD. Web‐based tools are available to analyse ethnicity‐related changes in newborn metabolism and to support developing methods to identify false‐positives in metabolic screening.
Blood collection for newborn genetic disease screening is preferably performed within 24–48 h after birth. We used population-level newborn screening (NBS) data to study early postnatal metabolic changes and whether timing of blood collection could impact screening performance. Newborns were grouped based on their reported age at blood collection (AaBC) into early (12–23 h), standard (24–48 h), and late (49–168 h) collection groups. Metabolic marker levels were compared between the groups using effect size analysis, which controlled for group size differences and influence from the clinical variables of birth weight and gestational age. Metabolite level differences identified between groups were correlated to NBS data from false-positive cases for inborn metabolic disorders including carnitine transport defect (CTD), isovaleric acidemia (IVA), methylmalonic acidemia (MMA), and phenylketonuria (PKU). Our results showed that 56% of the metabolites had AaBC-related differences, which included metabolites with either decreasing or increasing levels after birth. Compared to the standard group, the early-collection group had elevated marker levels for PKU (phenylalanine, Cohen's d = 0.55), IVA (C5, Cohen's d = 0.24), MMA (C3, Cohen's d = 0.23), and CTD (C0, Cohen's d = 0.23). These findings correlated with higher false-positive rates for PKU (P < 0.05), IVA (P < 0.05), and MMA (P < 0.001), and lower false-positive rate for CTD (P < 0.001) in the early-collection group. Blood collection before 24 h could affect screening performance for some metabolic disorders. We have developed web-based tools integrating AaBC and other variables for interpretive analysis of screening data.
Summary Large-scale, quantitative proteomics data are being generated at ever increasing rates by high-throughput, mass spectrometry technologies. However, due to the complexity of these large datasets as well as the increasing numbers of post-translational modifications (PTMs) that are being identified, developing effective methods for proteomic visualization has been challenging. ProteomicsBrowser was designed to meet this need for comprehensive data visualization. Using peptide information files exported from mass spectrometry search engines or quantitative tools as input, the peptide sequences are aligned to an internal protein database such as UniProtKB. Each identified peptide ion including those with PTMs is then visualized along the parent protein in the Browser. A unique property of ProteomicsBrowser is the ability to combine overlapping peptides in different ways to focus analysis of sequence coverage, charge state or PTMs. ProteomicsBrowser includes other useful functions, such as a data filtering tool and basic statistical analyses to qualify quantitative data. Availability and implementation ProteomicsBrowser is implemented in Java8 and is available at https://medicine.yale.edu/keck/nida/proteomicsbrowser.aspx and https://github.com/peng-gang/ProteomicsBrowser. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.