2017
DOI: 10.1101/217745
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Block Object Storage Service (bossDB): A Cloud-Native Approach for Petascale Neuroscience Discovery

Abstract: Large volumetric neuroimaging datasets have grown in size over the past ten years from gigabytes to terabytes, with petascale data becoming available and more common over the next few years. Current approaches to store and analyze these emerging datasets are insufficient in their ability to scale in both cost-effectiveness and performance. Additionally, enabling large-scale processing and annotation is critical as these data grow too large for manual inspection. We propose a new cloud-native managed service fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3

Relationship

4
3

Authors

Journals

citations
Cited by 13 publications
(25 citation statements)
references
References 16 publications
0
25
0
Order By: Relevance
“…For ease of use, DICED is a Python wrapper that makes a DVID store behave as if it were a NumPy array [74]. bossDB uses microservices (AWS Lambda) to fulfill cutout requests from an image in Amazon S3 cloud storage [75]. ndstore shares common roots with bossDB [76], but uses Amazon EC2 servers rather than microservices.…”
Section: Distributed Chunk-wise Processing Of Large Imagesmentioning
confidence: 99%
“…For ease of use, DICED is a Python wrapper that makes a DVID store behave as if it were a NumPy array [74]. bossDB uses microservices (AWS Lambda) to fulfill cutout requests from an image in Amazon S3 cloud storage [75]. ndstore shares common roots with bossDB [76], but uses Amazon EC2 servers rather than microservices.…”
Section: Distributed Chunk-wise Processing Of Large Imagesmentioning
confidence: 99%
“…A single SBFSEM volumetric reconstruction may create an image stack comprising thousands of micrographs, which can occupy hundreds of gigabytes, and this would represent only a fraction of the data needed to describe the reach of one pyramidal neuron . For large‐scale ultrastructural studies, automated analysis tools and data storage solutions are needed to facilitate the processing of large imaging datasets collected from many individuals across broad swaths of the brain . While obstacles in access to tissue samples, image collection time, and data storage are nontrivial, we highlight these imaging techniques because of their unique potential for providing direct evidence of structural disruptions producing pathology.…”
Section: Imaging Brain Structure At Subcellular Resolution With Scannmentioning
confidence: 99%
“…151 For large-scale ultrastructural studies, automated analysis tools and data storage solutions are needed to facilitate the processing of large imaging datasets collected from many individuals across broad swaths of the brain. 156 While obstacles in access to tissue samples, image collection time, and data storage are nontrivial, we highlight these imaging techniques because of their unique potential for providing direct evidence of structural disruptions producing pathology. Further, we hope that appraisal of both the promise and limitations associated with these approaches can accelerate their development and eventual application.…”
Section: Imaging Brain Structure At Subcellular Resolution With Scannmentioning
confidence: 99%
“…Large neuroimaging datasets are distinct from many canonical big data solutions because researchers typically analyze a few (often one) very large datasets instead of many individual images. Custom storage solutions [19,51] exist, but often require tools, knowledge, and access patterns that are disparate from those used by many neuroscience laboratories. SABER provides tools to connect to specialized neuroimaging databases which integrate into CWL tool pipelines.…”
Section: Cloud Computation and Storagementioning
confidence: 99%
“…SABER introduces canonical pipelines for EM and XRM, specified in CWL, with a library of dockerized tools. These tools are deployed using the workflow execution engine Apache Airflow [16] using Amazon Web Services (AWS) Batch to scale compute resources with imaging data stored in the volumetric database bossDB [19]. Metadata, parameters, and tabular results are logged using the neuroimaging database Datajoint [20].…”
Section: Introductionmentioning
confidence: 99%