For technologies that are commonly used in ordinary laboratories such as fluorescence-polarization detection with template-directed, dye-terminator incorporation (FP-TDI), SNP genotype scoring is usually done manually. Here we study rates of errors and missing genotypes obtained with this procedure. We also introduce three statistical genotype scoring methods to examine whether they form a viable alternative. Data consisted of eight SNPs typed in about 1400 individuals from 268 pedigrees. The statistical procedures performed better on several internal criteria, such as the number of Mendelian errors, and showed much higher agreement with discrepant genotypes re-scored by two raters. The best results were obtained with the statistical procedure that incorporated information about regularities in the error structure of the FP-TDI data. We estimated that there were about 1.6% more errors if genotypes were scored manually. About 0.6% of these errors could be explained by data manipulation errors, leaving 1% as the result of possible incorrect scoring. There were 3.3% more missing genotypes in the manual scoring due to errors in data manipulation (1.7%) and conservative scoring (1.6%).
Genome scans, including both genome-wide association studies and deep sequencing, continue to discover a growing number of significant association signals for various traits. However, often variants meeting genome-wide significance criteria explain far less of the overall trait variance than "sub-threshold" association signals. To extract these sub-threshold signals, there is a need for methods which accurately estimate the mean of all (normally-distributed) test-statistics from a genome scan (i.e., Z-scores). This is currently achieved by the difficult procedures of adjusting all Z-score ( 1 2 ) statistics for "winner's curse" (multiple testing). Given that multiple testing adjustments are much simpler for p-values, we propose a method for estimating Zscores means by i) first adjusting their p-values for multiple testing and then ii) transforming the adjusted p-values to upper tail Z-scores with the sign of the original statistics. Because a False Discovery Rate (FDR) procedure is used for multiple testing adjustment, we denote this method FDR Inverse Quantile Transformation (FIQT). When compared to competitors, e.g. Empirical Bayes (including proposed improvements), FIQT is more i) accurate and ii) computationally efficient by orders of magnitude. Its accuracy advantage is substantial at larger sample sizes and/or moderate numbers of association signals. Practical application of FIQT to Z-scores from the first Psychiatric Genetic Consortium (PGC) schizophrenia predicts a non-trivial fraction of the significant signal regions from the subsequent published PGC schizophrenia studies. Finally, we suggest that FIQT might be i) used to improve subject level risk prediction and ii) further improved by modelling the noncentrality of 1 2 statistics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.