Motivation: PSORTb has remained the most precise bacterial protein subcellular localization (SCL) predictor since it was first made available in 2003. However, the recall needs to be improved and no accurate SCL predictors yet make predictions for archaea, nor differentiate important localization subcategories, such as proteins targeted to a host cell or bacterial hyperstructures/organelles. Such improvements should preferably be encompassed in a freely available web-based predictor that can also be used as a standalone program.Results: We developed PSORTb version 3.0 with improved recall, higher proteome-scale prediction coverage, and new refined localization subcategories. It is the first SCL predictor specifically geared for all prokaryotes, including archaea and bacteria with atypical membrane/cell wall topologies. It features an improved standalone program, with a new batch results delivery system complementing its web interface. We evaluated the most accurate SCL predictors using 5-fold cross validation plus we performed an independent proteomics analysis, showing that PSORTb 3.0 is the most accurate but can benefit from being complemented by Proteome Analyst predictions.Availability: http://www.psort.org/psortb (download open source software or use the web interface).Contact: psort-mail@sfu.caSupplementary Information: Supplementary data are availableat Bioinformatics online.
This paper provides an overview of the probability sample designs and sampling methods for the Collaborative Psychiatric Epidemiology Studies (CPES): the National Comorbidity Survey Replication (NCS-R), the National Study of American Life (NSAL) and the National Latino and Asian American Study of Mental Health (NLAAS). The multi-stage sample design and respondent selection procedures used in these three studies are based on the University of Michigan Survey Research Center's National Sample designs and operations. The paper begins with a general overview of these designs and procedures and then turns to a more detailed discussion of the adaptation of these general methods to the three specific study designs. The detailed discussions of the individual study samples focus on design characteristics and outcomes that are important to analysts of the CPES data sets and to researchers and statisticians who are planning future studies. The paper describes how the expected survey cost and error structure for each of these surveys influenced the original design of the samples and how actual field experience led to changes and adaptations to arrive at the final samples of each survey population.
BackgroundDNA methylation plays an essential role in the regulation of gene expression. While its presence near the transcription start site of a gene has been associated with reduced expression, the variation in methylation levels across individuals, its environmental or genetic causes, and its association with gene expression remain poorly understood.ResultsWe report the joint analysis of sequence variants, gene expression and DNA methylation in primary fibroblast samples derived from a set of 62 unrelated individuals. Approximately 2% of the most variable CpG sites are mappable in cis to sequence variation, usually within 5 kb. Via eQTL analysis with microarray data combined with mapping of allelic expression regions, we obtained a set of 2,770 regions mappable in cis to sequence variation. In 9.5% of these expressed regions, an associated SNP was also a methylation QTL. Methylation and gene expression are often correlated without direct discernible involvement of sequence variation, but not always in the expected direction of negative for promoter CpGs and positive for gene-body CpGs. Population-level correlation between methylation and expression is strongest in a subset of developmentally significant genes, including all four HOX clusters. The presence and sign of this correlation are best predicted using specific chromatin marks rather than position of the CpG site with respect to the gene.ConclusionsOur results indicate a wide variety of relationships between gene expression, DNA methylation and sequence variation in untransformed adult human fibroblasts, with considerable involvement of chromatin features and some discernible involvement of sequence variation.
Non-response weighting is a commonly used method to adjust for bias due to unit nonresponse in surveys. Theory and simulations show that, to reduce bias effectively without increasing variance, a covariate that is used for non-response weighting adjustment needs to be highly associated with both the response indicator and the survey outcome variable. In practice, these requirements pose a challenge that is often overlooked, because those covariates are often not observed or may not exist. Surveys have recently begun to collect supplementary data, such as interviewer observations and other proxy measures of key survey outcome variables. To the extent that these auxiliary variables are highly correlated with the actual outcomes, these variables are promising candidates for non-response adjustment. In the present study, we examine traditional covariates and new auxiliary variables for the National Survey of Family Growth, the Medical Expenditure Panel Survey, the American National Election Survey, the European Social 389 390 K r e u t e r e t a l . 1 7 3 ( 2 0 1 0 ) Surveys and the University of Michigan Transportation Research Institute survey. We provide empirical estimates of the association between proxy measures and response to the survey request as well as the actual survey outcome variables. We also compare unweighted and weighted estimates under various non-response models. Our results from multiple surveys with multiple recruitment protocols from multiple organizations on multiple topics show the difficulty of finding suitable covariates for non-response adjustment and the need to improve the quality of auxiliary data. i n J o u r n a l o f t h e ro y a l S t a t i S t i c a l S o c i e t y a
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.