A statistical approach to identify, monitor, and manage incomplete curated data sets

Howe, Douglas G.

doi:10.1186/s12859-018-2121-6

Cited by 2 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The feature data of class A features are already available in the original dataset. They do not need subsequent processing, while the feature data of class B features are obtained after extraction and corresponding calculation from the original dataset based on business logic relationships (Bi and Wang, 2019;Howe, 2018;Viegas et al, 2017). Figure 2 presents an explanation of the meaning of class A and class B features:…”

Section: Analysis Of Feature Engineeringmentioning

confidence: 99%

The Use of an Internet of Things Data Management System Using Data Mining Association Algorithm in an E-Commerce Platform

Wang

Zhang

Gao

et al. 2023

Journal of Organizational and End User Computing

View full text Add to dashboard Cite

The development of e-commerce has greatly changed the development of social retail formats. Business-to-consumer (B2C) e-commerce model is important. Due to the characteristics of high consumer trust and commodities dominated by electronic products and brand commodities, the income and profits generated are also very considerable. Therefore, the major e-commerce giants have increased the development of B2C formats. Logistics service capability and level have become an important driving force for the development of B2C e-commerce. How to optimize the inventory of B2C e-commerce and realize the organic balance between the economy and service capacity of the whole logistics chain has become a very urgent problem faced by major e-commerce giants. From the perspective of big data, first, the overview of the dataset used is analyzed based on the real operation data of a business to consumer (B2C) e-commerce platform.

show abstract

Section: Analysis Of Feature Engineeringmentioning

confidence: 99%

The Use of an Internet of Things Data Management System Using Data Mining Association Algorithm in an E-Commerce Platform

Wang

Zhang

Gao

et al. 2023

Journal of Organizational and End User Computing

View full text Add to dashboard Cite

show abstract

“…It is becoming a key task, given that expert-curated web-accessible databases are one of the main driving forces in current research in biology in general and bioinformatics in particular 4 . The responsibilities of curators may include data collection; consistency, incompleteness 5 and accuracy control; annotation using widely accepted nomenclatures; or evaluation of computational analysis, amongst others. Biocuration requires broad expertise in the domain because of the vast amount of heterogeneous information available from literature, often lacking a unified and standardized approach for the representation and analysis of data.…”

Section: Introductionmentioning

confidence: 99%

Using machine learning tools for protein database biocuration assistance

König

Shaim

Vellido

et al. 2018

Sci Rep

View full text Add to dashboard Cite

Biocuration in the omics sciences has become paramount, as research in these fields rapidly evolves towards increasingly data-dependent models. As a result, the management of web-accessible publicly-available databases becomes a central task in biological knowledge dissemination. One relevant challenge for biocurators is the unambiguous identification of biological entities. In this study, we illustrate the adequacy of machine learning methods as biocuration assistance tools using a publicly available protein database as an example. This database contains information on G Protein-Coupled Receptors (GPCRs), which are part of eukaryotic cell membranes and relevant in cell communication as well as major drug targets in pharmacology. These receptors are characterized according to subtype labels. Previous analysis of this database provided evidence that some of the receptor sequences could be affected by a case of label noise, as they appeared to be too consistently misclassified by machine learning methods. Here, we extend our analysis to recent and quite substantially modified new versions of the database and reveal their now extremely accurate labeling using several machine learning models and different transformations of the unaligned sequences. These findings support the adequacy of our proposed method to identify problematic labeling cases as a tool for database biocuration.

show abstract

A statistical approach to identify, monitor, and manage incomplete curated data sets

Cited by 2 publications

References 17 publications

The Use of an Internet of Things Data Management System Using Data Mining Association Algorithm in an E-Commerce Platform

The Use of an Internet of Things Data Management System Using Data Mining Association Algorithm in an E-Commerce Platform

Using machine learning tools for protein database biocuration assistance

Contact Info

Product

Resources

About