2020
DOI: 10.12688/f1000research.28033.1
|View full text |Cite
|
Sign up to set email alerts
|

HGNChelper: identification and correction of invalid gene symbols for human and mouse

Abstract: Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChe… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(27 citation statements)
references
References 11 publications
0
27
0
Order By: Relevance
“…In studies where raw count matrices were unavailable (Couturier2020, Bhaduri2020), BAM files were converted to FASTQ files and re-aligned to GRCh38 using CellRanger v3.1/4.0. All gene names were updated to the latest HUGO nomenclature using HGNChelper (Oh et al, 2020). All clinical/diagnostic metadata was harmonized and preserved.…”
Section: Methodsmentioning
confidence: 99%
“…In studies where raw count matrices were unavailable (Couturier2020, Bhaduri2020), BAM files were converted to FASTQ files and re-aligned to GRCh38 using CellRanger v3.1/4.0. All gene names were updated to the latest HUGO nomenclature using HGNChelper (Oh et al, 2020). All clinical/diagnostic metadata was harmonized and preserved.…”
Section: Methodsmentioning
confidence: 99%
“…Aligned sequences we processed using the Seurat package (version 4.0.2) in R [33]. Gene names between experiments were correlated using the HGNChelper package [34], using the suggested gene symbol for each gene except when it would create a duplicate reference. Genes were filtered from individual runs if they did not appear in three or more cells.…”
Section: Data Processing and Integrationmentioning
confidence: 99%
“…The set of evolutionarily conserved human miRNA binding sites was developed by Agarwal et al using context++ model prediction and downloaded from the TargetScan database 7.1 (June 2016 release) [ 10 ]. HGNChelper R package (version 0.8.1) and updated reference map via function getCurrentHumanMap were used to update obsolete gene symbols and historical aliases to current gene symbols maintained by The HUGO Gene Nomenclature Committee (HGNC) database [ 27 ]. The dataset contains 116,371 predicted miRNA binding sites in the 3′-UTRs of 12,436 human genes.…”
Section: Methodsmentioning
confidence: 99%