2016
DOI: 10.18637/jss.v071.c02
|View full text |Cite
|
Sign up to set email alerts
|

SNPMClust: Bivariate Gaussian Genotype Clustering and Calling for Illumina Microarrays

Abstract: SNPMClust is an R package for genotype clustering and calling with Illumina microarrays. It was originally developed for studies using the GoldenGate custom genotyping platform but can be used with other Illumina platforms, including Infinium BeadChip. The algorithm first rescales the fluorescent signal intensity data, adds empirically derived pseudo-data to minor allele genotype clusters, then uses the package mclust for bivariate Gaussian model fitting. We compared the accuracy and sensitivity of SNPMClust t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…The WGA product was quantified using TaqMan RNase Reagent Kit [ 89 ]. Genotype clustering and calling was conducted using a previously developed and tested genotype calling algorithm, SNPMClust, that was developed in-house at the Arkansas Center for Birth Defects Research and Prevention [ 90 ]. To ensure high-quality genotypes, we applied stringent quality control measures and excluded SNPs with poor clustering behavior, no-call rates >10%, greater Mendelian error rates >5%, MAF <5%, or significant deviation from Hardy-Weinberg equilibrium in at least one racial group.…”
Section: Methodsmentioning
confidence: 99%
“…The WGA product was quantified using TaqMan RNase Reagent Kit [ 89 ]. Genotype clustering and calling was conducted using a previously developed and tested genotype calling algorithm, SNPMClust, that was developed in-house at the Arkansas Center for Birth Defects Research and Prevention [ 90 ]. To ensure high-quality genotypes, we applied stringent quality control measures and excluded SNPs with poor clustering behavior, no-call rates >10%, greater Mendelian error rates >5%, MAF <5%, or significant deviation from Hardy-Weinberg equilibrium in at least one racial group.…”
Section: Methodsmentioning
confidence: 99%
“…The data from a genotyping assay are bivariate, representing the quantitative levels of fluorescent intensities of two probes designed to capture each of the two alleles-the stronger the intensity, the stronger is the signal for that allele. Clustering of the data from multiple individuals is used to partition samples into the three possible genotypes, either with a proprietary clustering algorithm built into the genotyping platforms, such as Illumina (Zhao et al 2018), or a (usually Gaussian) mixture model (Erickson & Callaway 2016). The typical shapes of the cluster components, volume of the data (both SNPs and numbers of individuals), along with experimental noise and instrumental limitations, complicate this process, leading to numerous errors, that then have to be corrected through a labour-intensive procedure of manual curation, magnified by the sheer order of repetition for the hundreds of thousands of SNPs.…”
Section: Genotype Identificationmentioning
confidence: 99%