Bo Dong scite author profile

Scientific experiments and large-scale simulations produce massive amounts of data. Many of these scientific datasets are arrays, and are stored in file formats such as HDF5 and NetCDF. Although scientific data management systems, such as SciDB, are designed to manipulate arrays, there are challenges in integrating these systems into existing analysis workflows. Major barriers include the expensive task of preparing and loading data before querying, and converting the final results to a format that is understood by the existing post-processing and visualization tools. As a consequence, integrating a data management system into an existing scientific data analysis workflow is time-consuming and requires extensive user involvement.In this paper, we present the design of a new scientific data analysis system that efficiently processes queries directly over data stored in the HDF5 file format. This design choice eliminates the tedious and error-prone data loading process, and makes the query results readily available to the next processing steps of the analysis workflow. Our design leverages the increasing main memory capacities found in supercomputers through bitmap indexing and in-memory query execution. In addition, query processing over the HDF5 data format can be effortlessly parallelized to utilize the ample concurrency available in large-scale supercomputers and modern parallel file systems. We evaluate the performance of our system on a large supercomputing system and experiment with both a synthetic dataset and a real cosmology observation dataset. Our system frequently outperforms the relational database system that the cosmology team currently uses, and is more than 10× faster than Hive when processing data in parallel. Overall, by eliminating the data loading step, our query processing system is more effective in supporting in situ scientific analysis workflows.

show abstract

Data Elevator: Low-Contention Data Movement in Hierarchical Storage System

Dong

Byna

et al. 2016

View full text Add to dashboard Cite

Flexural strength and Weibull analysis of Y-TZP fabricated by stereolithographic additive manufacturing and subtractive manufacturing

Mei

Zhang

et al. 2020

Journal of the European Ceramic Society

View full text Add to dashboard Cite

Deep learning for automatic cell detection in wide-field microscopy zebrafish images

Dong

Shao

Costa

et al. 2015

View full text Add to dashboard Cite

The zebrafish has become a popular experimental model organism for biomedical research. In this paper, a unique framework is proposed for automatically detecting Tyrosine Hydroxylase-containing (TH-labeled) cells in larval zebrafish brain z-stack images recorded through the wide-field microscope. In this framework, a supervised max-pooling Convolutional Neural Network (CNN) is trained to detect cell pixels in regions that are preselected by a Support Vector Machine (SVM) classifier. The results show that the proposed deep-learned method outperforms hand-crafted techniques and demonstrate its potential for automatic cell detection in wide-field microscopy z-stack zebrafish images.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bo Dong

Automated Quality Assessment of Cardiac MR Images Using Convolutional Neural Networks

Parallel data analysis directly on scientific file formats

Data Elevator: Low-Contention Data Movement in Hierarchical Storage System

Flexural strength and Weibull analysis of Y-TZP fabricated by stereolithographic additive manufacturing and subtractive manufacturing

Deep learning for automatic cell detection in wide-field microscopy zebrafish images

Contact Info

Product

Resources

About