2018
DOI: 10.1186/s12859-018-2121-6
|View full text |Cite
|
Sign up to set email alerts
|

A statistical approach to identify, monitor, and manage incomplete curated data sets

Abstract: BackgroundMany biological knowledge bases gather data through expert curation of published literature. High data volume, selective partial curation, delays in access, and publication of data prior to the ability to curate it can result in incomplete curation of published data. Knowing which data sets are incomplete and how incomplete they are remains a challenge. Awareness that a data set may be incomplete is important for proper interpretation, to avoiding flawed hypothesis generation, and can justify further… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…The feature data of class A features are already available in the original dataset. They do not need subsequent processing, while the feature data of class B features are obtained after extraction and corresponding calculation from the original dataset based on business logic relationships (Bi and Wang, 2019;Howe, 2018;Viegas et al, 2017). Figure 2 presents an explanation of the meaning of class A and class B features:…”
Section: Analysis Of Feature Engineeringmentioning
confidence: 99%
“…The feature data of class A features are already available in the original dataset. They do not need subsequent processing, while the feature data of class B features are obtained after extraction and corresponding calculation from the original dataset based on business logic relationships (Bi and Wang, 2019;Howe, 2018;Viegas et al, 2017). Figure 2 presents an explanation of the meaning of class A and class B features:…”
Section: Analysis Of Feature Engineeringmentioning
confidence: 99%
“…It is becoming a key task, given that expert-curated web-accessible databases are one of the main driving forces in current research in biology in general and bioinformatics in particular 4 . The responsibilities of curators may include data collection; consistency, incompleteness 5 and accuracy control; annotation using widely accepted nomenclatures; or evaluation of computational analysis, amongst others. Biocuration requires broad expertise in the domain because of the vast amount of heterogeneous information available from literature, often lacking a unified and standardized approach for the representation and analysis of data.…”
Section: Introductionmentioning
confidence: 99%