Data imbalance issue generally exists in most medical image analysis problems and maybe getting important with the popularization of data-hungry deep learning paradigms. We explore the cuttingedge Wasserstein generative adversarial networks (WGANs) to address the data imbalance problem with oversampling on the minority classes. The WGAN can estimate the underlying distribution of a minority class to synthesize more plausible and helpful samples for the classification model. In this paper, the WGANbased over-sampling technique is applied to augment the data to balance for the fine-grained classification of seven semantic attributes of lung nodules in computed tomography images. The fine-grained classification is carried out with a normal convolutional neural network (CNN). To further illustrate the efficacy of the WGAN-based over-sampling technique, the conventional data augmentation method commonly used in many deep learning works, the generative adversarial networks (GANs), and the deep convolutional generative adversarial networks (DCGANs) are implemented for comparison. The whole schemes of the minority oversampling and fine-grained classification are tested with the public lung imaging database consortium dataset. The experimental results suggest that the WGAN-based oversampling technique can synthesize helpful samples for the minority classes to assist the training of the CNN model and to boost the fine-grained classification performance better than the conventional data augmentation method and the two schemes of the GAN and DCGAN techniques do. It may thus suggest that the WGAN technique offers an alternative methodological option for the further deep learning on imbalanced classification studies.INDEX TERMS Computer-aided diagnosis (CAD), lung nodule, computed tomography (CT), synthetic minority over-sampling, deep learning, data imbalance, adversarial neural networks.
With the coming concept of 'big data', the ability to handle large datasets has become a critical consideration for the success of industrial organizations such as Google, Amazon, Yahoo! and Facebook. As an important Cloud Computing framework for bulk data processing, Hadoop is widely used in these organizations. However, the performance of MapReduce is seriously limited by its stiff configuration strategy. Even for a single simple job in Hadoop, a large number of tuning parameters have to be set by users. This may easily lead to performance loss due to some misconfigurations. In this paper, we present an adaptive automatic configuration tool (AACT) for Hadoop to achieve performance optimization. To achieve this goal, we propose a mathematical model which will accurately learn the relationship between system performance and configuration parameters, then configure Hadoop system based on this mathematical model. With the help of AACT, Hadoop is able to adapt the hardware and software configurations dynamically and drive the system to an optimal configuration in acceptable time. Experimental results show its efficiency and adaptability, and that it is ten times faster compared with default configuration.
Sequence alignment algorithms are a basic and critical component of many bioinformatics fields. With rapid development of sequencing technology, the fast growing reference database volumes and longer length of query sequence become new challenges for sequence alignment. However, the algorithm is prohibitively high in terms of time and space complexity. In this paper, we present DSA, a scalable distributed sequence alignment system that employs Spark to process sequences data in a horizontally scalable distributed environment, and leverages data parallel strategy based on Single Instruction Multiple Data (SIMD) instruction to parallelize the algorithm in each core of worker node. The experimental results demonstrate that 1) DSA has outstanding performance and achieves up to 201x speedup over SparkSW.2) DSA has excellent scalability and achieves near linear speedup when increasing the number of nodes in cluster.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.