The recent progress of computing, machine learning, and especially deep learning, for image recognition brings a meaningful effect for automatic detection of various diseases from chest X-ray images (CXRs). Here efficiency of lung segmentation and bone shadow exclusion techniques is demonstrated for analysis of 2D CXRs by deep learning approach to help radiologists identify suspicious lesions and nodules in lung cancer patients. Training and validation was performed on the original JSRT dataset (dataset #01), BSE-JSRT dataset, i.e. the same JSRT dataset, but without clavicle and rib shadows (dataset #02), original JSRT dataset after segmentation (dataset #03), and BSE-JSRT dataset after segmentation (dataset #04). The results demonstrate the high efficiency and usefulness of the considered pre-processing techniques in the simplified configuration even. The pre-processed dataset without bones (dataset #02) demonstrates the much better accuracy and loss results in comparison to the other pre-processed datasets after lung segmentation (datasets #02 and #03).
The results of chest X-ray (CXR) analysis of 2D images to get the statistically reliable predictions (availability of tuberculosis) by computer-aided diagnosis (CADx) on the basis of deep learning are presented. They demonstrate the efficiency of lung segmentation, lossless and lossy data augmentation for CADx of tuberculosis by deep convolutional neural network (CNN) applied to the small and not well-balanced dataset even. CNN demonstrates ability to train (despite overfitting) on the pre-processed dataset obtained after lung segmentation in contrast to the original not-segmented dataset. Lossless data augmentation of the segmented dataset leads to the lowest validation loss (without overfitting) and nearly the same accuracy (within the limits of standard deviation) in comparison to the original and other pre-processed datasets after lossy data augmentation. The additional limited lossy data augmentation results in the lower validation loss, but with a decrease of the validation accuracy. In conclusion, besides the more complex deep CNNs and bigger datasets, the better progress of CADx for the small and not well-balanced datasets even could be obtained by better segmentation, data augmentation, dataset stratification, and exclusion of non-evident outliers.
The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation Also, we present the results of testingneural networks architectures on H2O platform for variousactivation functions, stopping metrics, and other parameters ofmachine learning algorithm. It was demonstrated for the usecase of MNIST database of handwritten digits in single-threadedmode that blind selection of these parameters can hugely increase (by 2-3 orders) the runtime without the significant increase ofprecision. This result can have crucial influence for optimizationof available and new machine learning methods, especially forimage recognition problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.