Today, several projects are working toward reducing inequities and improving health care for individuals affected with rare genetic diseases from diverse populations. One route to reduce inequities is to generate variant catalogues for diverse populations. To that end, we developed CAFE (Cohort Allele Frequency Estimation) pipeline, an open-source pipeline implemented in the NextFlow framework. CAFE pipeline includes detection of single nucleotide variants, small insertions and deletions, mitochondrial variants, structural variants, mobile element insertions, and short tandem repeats. Sample and variant quality control, allele frequency calculation (for whole and sex-stratified cohorts) and annotation steps are also included, delivering vcf files with annotated variants and their frequency in the cohort. Successful application of CAFE pipeline to 100 publicly available human genomes is described. We hope that, by making this pipeline available, more under-represented populations benefit from enhanced capacity to generate high-quality variant catalogues.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.