Brandon D. Pickett scite author profile

BackgroundAnalyzing next-generation sequencing data is difficult because datasets are large, second generation sequencing platforms have high error rates, and because each position in the target genome (exome, transcriptome, etc.) is sequenced multiple times. Given these challenges, numerous bioinformatic algorithms have been developed to analyze these data. These algorithms aim to find an appropriate balance between data loss, errors, analysis time, and memory footprint. Typical analysis pipelines require multiple steps. If one or more of these steps is unnecessary, it would significantly decrease compute time and data manipulation to remove the step. One step in many pipelines is PCR duplicate removal, where PCR duplicates arise from multiple PCR products from the same template molecule binding on the flowcell. These are often removed because there is concern they can lead to false positive variant calls. Picard (MarkDuplicates) and SAMTools (rmdup) are the two main softwares used for PCR duplicate removal.ResultsApproximately 92 % of the 17+ million variants called were called whether we removed duplicates with Picard or SAMTools, or left the PCR duplicates in the dataset. There were no significant differences between the unique variant sets when comparing the transition/transversion ratios (p = 1.0), percentage of novel variants (p = 0.99), average population frequencies (p = 0.99), and the percentage of protein-changing variants (p = 1.0). Results were similar for variants in the American College of Medical Genetics genes. Genotype concordance between NGS and SNP chips was above 99 % for all genotype groups (e.g., homozygous reference).ConclusionsOur results suggest that PCR duplicate removal has minimal effect on the accuracy of subsequent variant calls.

show abstract

Lingering Taxonomic Challenges Hinder Conservation and Management of Global Bonefishes

Pickett

Wallace

Ridge

et al. 2020

Fisheries

View full text Add to dashboard Cite

Despite expanding research on the popular recreational fishery, bonefish taxonomy remains murky. The genus Albula, comprising these iconic circumtropical marine sportfishes, has a complex taxonomic history driven by highly conserved morphology. Presently, 12 putative species are spread among 3 species complexes. The cryptic morphology hinders visual identification, requiring genetic species identification in some cases. Unclear nomenclature can have unintended consequences, including exacerbating taxonomic uncertainty and complicating resolution efforts. Further, ignoring this reality in publications may erode management and conservation efforts. In the Indian and Pacific oceans, ranges and areas of overlap are unclear, precluding certainty about which species support the fishery and hindering conservation efforts. Species overlap, at both broad and localized spatial scales, may mask population declines if one is targeted primarily (as demonstrated in the western Atlantic fishery). Additional work is necessary, especially to increase our understanding of spatiotemporal ecology across life history stages and taxa. If combined with increased capacity to discern between cryptic species, population structure may be ascertained, and fisheries stakeholders will be enabled to make informed decisions. To assist in such efforts, we have constructed new range maps for each species and species complex. For bonefishes, conservation genomic approaches may resolve lingering taxonomic uncertainties, supporting effective conservation and management efforts. These methods apply broadly to taxonomic groups with cryptic diversity, aiding species delimitation and taxonomic revisions.

show abstract

JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm

Miller

Pickett

Ridge

2018

View full text Add to dashboard Cite

show abstract

Kmer-SSR: a fast and exhaustive SSR search algorithm

Pickett

Miller

Ridge

2017

View full text Add to dashboard Cite

MotivationOne of the main challenges with bioinformatics software is that the size and complexity of datasets necessitate trading speed for accuracy, or completeness. To combat this problem of computational complexity, a plethora of heuristic algorithms have arisen that report a ‘good enough’ solution to biological questions. However, in instances such as Simple Sequence Repeats (SSRs), a ‘good enough’ solution may not accurately portray results in population genetics, phylogenetics and forensics, which require accurate SSRs to calculate intra- and inter-species interactions.ResultsWe present Kmer-SSR, which finds all SSRs faster than most heuristic SSR identification algorithms in a parallelized, easy-to-use manner. The exhaustive Kmer-SSR option has 100% precision and 100% recall and accurately identifies every SSR of any specified length. To identify more biologically pertinent SSRs, we also developed several filters that allow users to easily view a subset of SSRs based on user input. Kmer-SSR, coupled with the filter options, accurately and intuitively identifies SSRs quickly and in a more user-friendly manner than any other SSR identification algorithm.Availability and implementationThe source code is freely available on GitHub at https://github.com/ridgelab/Kmer-SSR.

show abstract

SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences

Pickett

Karlinsey

Penrod

et al. 2016

View full text Add to dashboard Cite

Summary: Simple Sequence Repeats (SSRs) are used to address a variety of research questions in a variety of fields (e.g. population genetics, phylogenetics, forensics, etc.), due to their high mutability within and between species. Here, we present an innovative algorithm, SA-SSR, based on suffix and longest common prefix arrays for efficiently detecting SSRs in large sets of sequences. Existing SSR detection applications are hampered by one or more limitations (i.e. speed, accuracy, ease-of-use, etc.). Our algorithm addresses these challenges while being the most comprehensive and correct SSR detection software available. SA-SSR is 100% accurate and detected >1000 more SSRs than the second best algorithm, while offering greater control to the user than any existing software.Availability and implementation: SA-SSR is freely available at http://github.com/ridgelab/SA-SSRContact: perry.ridge@byu.eduSupplementary information: Supplementary data are available at Bioinformatics online.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.