2011
DOI: 10.1109/tcbb.2009.83
|View full text |Cite
|
Sign up to set email alerts
|

Identifying Relevant Data for a Biological Database: Handcrafted Rules versus Machine Learning

Abstract: With well over one thousand specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the fi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
1

Year Published

2011
2011
2015
2015

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 30 publications
0
5
0
1
Order By: Relevance
“…We have: Incorporated an improved administration page, built-in semi-automatic machine learning tools ( 11 ) and user contributions, allowing protein history tracking, see Wakabayashi et al ( 10 ). Updated software to BLAST 2.2.27.…”
Section: Recent Technical Improvements (2011–13)mentioning
confidence: 99%
“…We have: Incorporated an improved administration page, built-in semi-automatic machine learning tools ( 11 ) and user contributions, allowing protein history tracking, see Wakabayashi et al ( 10 ). Updated software to BLAST 2.2.27.…”
Section: Recent Technical Improvements (2011–13)mentioning
confidence: 99%
“…But the tree constructed may become too large and it has Oversensivity to training set. Practical machine learning approaches for identification of MEDLINE documents and Swiss-Prot/TrEMBL protein records have been described in [9]. This also involves incorporation of those into TCDB which is a biological repository for transport proteins.…”
Section: Literature Surveymentioning
confidence: 99%
“…Whenever a con-causal relation between different data sources exists, but it is not clear how to design and parametrize a direct algorithm to exploit such relation, resorting to some machine learning technique is a natural choice. In fact, given a reasonable feature selection, learning techniques have proven to be often more effective in classification tasks than manually crafted solution that exploit a direct knowledge of the problem domain [3,30]. We decided to address the issue of blob-device association in terms of a non-probabilistic binary classification problem.…”
Section: Identification By Learningmentioning
confidence: 99%