2018
DOI: 10.1101/258822
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BDQC: a general-purpose analytics tool for domain-blind validation of Big Data

Abstract: Translational biomedical research is generating exponentially more data: thousands of whole-genome sequences (WGS) are now available; brain data are doubling every two years. Analyses of Big Data, including imaging, genomic, phenotypic, and clinical data, present qualitatively new challenges as well as opportunities. Among the challenges is a proliferation in ways analyses can fail, due largely to the increasing length and complexity of processing pipelines. Anomalies in input data, runtime resource exhaustion… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
1
1

Relationship

3
2

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…As a partial way to mitigate this deficiency, we recommend performing global dataset comparisons using genome fingerprints, and other general-purpose 32 or domain-specific metrics. Such relative benchmarking, in which each individual genome can serve as its own reference, can supplement absolute benchmarking relative to truth sets.…”
Section: Discussionmentioning
confidence: 99%
“…As a partial way to mitigate this deficiency, we recommend performing global dataset comparisons using genome fingerprints, and other general-purpose 32 or domain-specific metrics. Such relative benchmarking, in which each individual genome can serve as its own reference, can supplement absolute benchmarking relative to truth sets.…”
Section: Discussionmentioning
confidence: 99%
“…We observe that such verification may be insufficient for global evaluation of large genome datasets including samples from diverse population backgrounds, which may be differentially affected by reference and software changes. As a partial way to mitigate this deficiency, we recommend performing global dataset comparisons using genome fingerprints and other general-purpose [9] or domain-specific metrics. Such 'relative benchmarking', in which each individual genome can serve as its own reference, can supplement 'absolute benchmarking' relative to truth sets.…”
Section: Discussionmentioning
confidence: 99%
“…As a partial way to mitigate this deficiency, we recommend performing global dataset comparisons using genome fingerprints, and other general-purpose 27 or domain-specific metrics. Such ‘relative benchmarking’, in which each individual genome can serve as its own reference, can supplement ‘absolute benchmarking’ relative to truth sets.…”
Section: Discussionmentioning
confidence: 99%
“…Small data analyses can be implemented effectively via R or Python scripts, that can be 179 executed on a workstation or a cloud-hosted virtual machine and then shared as 180 documents or via notebook environments such as Jupyter [23]. Big data analyses can be 181 more challenging to implement and share, due to the need to orchestrate the execution 182 of multiple application programs on many processors in order to process large quantities 183 of data in a timely manner, whether for quality control [24], computation of derived 184 quantities, or other purposes. 185…”
Section: Globus Genomics For Parallel Cloud-based Computation 178mentioning
confidence: 99%