2012
DOI: 10.1142/s0219720012500035
|View full text |Cite
|
Sign up to set email alerts
|

Adjusted Geometric-Mean: A Novel Performance Measure for Imbalanced Bioinformatics Datasets Learning

Abstract: One common and challenging problem faced by many bioinformatics applications, such as promoter recognition, splice site prediction, RNA gene prediction, drug discovery and protein classification, is the imbalance of the available datasets. In most of these applications, the positive data examples are largely outnumbered by the negative data examples, which often leads to the development of sub-optimal prediction models having high negative recognition rate (Specificity = SP) and low positive recognition rate (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
35
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 51 publications
(35 citation statements)
references
References 21 publications
0
35
0
Order By: Relevance
“…Because the BF data is an imbalanced 3-class dataset, in order to objectively evaluate the performance, the Adjusted Geometric Mean (AGM) (Batuwita & Palade 2012) is selected as the main metric for evaluating the prediction performance. The AGM is an extension of the GM (geometric mean) metric, which in turn is calculated from the four basic statistical measures (True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN)) with the following formulas:…”
Section: Outcomes and Comparisonmentioning
confidence: 99%
“…Because the BF data is an imbalanced 3-class dataset, in order to objectively evaluate the performance, the Adjusted Geometric Mean (AGM) (Batuwita & Palade 2012) is selected as the main metric for evaluating the prediction performance. The AGM is an extension of the GM (geometric mean) metric, which in turn is calculated from the four basic statistical measures (True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN)) with the following formulas:…”
Section: Outcomes and Comparisonmentioning
confidence: 99%
“…Supervised machine learning approaches for bioinformatics problems have been widely used (Liu et al, 2012;Chen, 2008;Wang and Wu, 2006;Yu et al, 2013;Erdoğdu et al, 2013;Jiang et al, 2013;Rider et al, 2014;Huang, 2013). The problem of identifying splice sites using machine learning techniques has also been addressed, mostly by supervised methods (Baten, et al, 2006;Baten et al, 2007;Sonnenburg et al, 2007;Castelo and Guigó, 2004;Batuwita and Palade, 2012). For example, in , the authors present a state-of-the-art method using SVM and an RBF kernel for human splice site detection.…”
Section: Related Workmentioning
confidence: 99%
“…Large-margin based classifiers, such as SVM, would be impractical, due to their large number of parameters that need tuning and longer computational times as compared to Naïve Bayes. Similar to Batuwita and Palade (2012), we also used under-and SMOTE over-sampling. Wei and Dunbrack (2013) explored the effects that balancing both the training and test datasets have on the SVM algorithm.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The same authors also proposed a novel measure for evaluating the learning of supervised classifiers on imbalanced bioinformatics datasets, namely the 'adjusted geometric-mean' (Batuwita and Palade, 2012). In this work, the authors conducted experiments on ten DNA (including splice sites) and protein datasets, with imbalance degrees of up to 1-to-14 and dataset sizes with up to 10K instances.…”
Section: Related Workmentioning
confidence: 99%